
When the proxy IP meets XPath contains() wonderful reaction
Crawlers know that data crawling is the worst thing you can do.Dynamic class namerespond in singingRandom element idThis is where XPath's contains() function is like a barbecue skewer. At this time XPath's contains() function is like a barbecue skewer at a late-night snack stand, which can string up all sorts of bits and pieces of information. However, many people only know to use contains(text(), 'keyword'), which is like holding a submachine gun as a burning stick to make.
I. Trident Usage in Proxy IP Scenarios
When paired with ipipgo's premium proxies, contains() can play tricks:
| take | combination of techniques | anti-blocking technique |
|---|---|---|
| Multi-language website | contains(@class,'product')+contains(. ,'$') | EU nodes with ipipgo |
| Price fluctuation monitoring | //div[contains(@id,'price_')][contains(. ,'.99′)] | Setting up IP rotation for 3 seconds/times |
| CAPTCHA trap | //input[contains(@name,'captcha')]/following-sibling::img | Switch Residential Agents Now |
Remember to put in the backend of ipipgoIP switching frequencyrespond in singingtimeout and retrySetting it to smart mode is much less of a hassle than doing it manually.
Second, the attribute value fuzzy matching of the soi operation
Many sites will add random suffixes to elements, such as class="btn-submit-5a3b". This is when you can write it like this:
//button[contains(@class,'btn-submit') and contains(@onclick,'submitForm')]
This combo hits, regardless of whether it's followed by Martian or gibberish. Combined with ipipgo'sStatic long-lasting agentsThe same IP will remain unchanged for half an hour and will not trigger the verification, which is measured to be 37% more stable than the dynamic IP.
III. Flash localization under multi-layer nesting
Don't be quick to curse when you come across a nested DOM structure, try this:
//div[contains(@style,'display: block')]//span[contains(@data-bind,'ko.observable') ][contains(. ,'inventory')]
This trick specializes in elements generated by various front-end frameworks. ipipgo'sexclusive IP pool有个隐藏功能——可以绑定特定机房线路,比如专门用圣何塞节点抓北美电商,能压到200ms以内。
IV. The ultimate mystery of the combination of motion and static
Mix and match contains() with axis expressions:
//table[contains(@class,'data-table')]/tbody/tr[position()>1]/td[contains(normalize-space(), ' spot')]/preceding-sibling::td[1]
This writeup allows you to skip right over the table header to grab the spot item, which is much faster than a regular expression. Remember to turn on ipipgo inRequest interval randomizationIf you set the access interval to a random value between 1.8 and 3.2 seconds, the anti-climbing system will not be able to figure out the pattern at all.
QA First Aid Kit
Q: What should I do if I always get my IP blocked by websites?
A: 80% of the proxy quality is not good, ipipgo'sCommercial level agentsComes with UA camouflage and TLS fingerprint obfuscation, new users get 1G traffic test for free.
Q: How can I monitor hundreds of websites at the same time?
A: Use ipipgo'sMulti-Threading PackageIn conjunction with xpath's contains()+starts-with() combo query, remember to set the timeout threshold to 8 seconds.
Q: Can't catch dynamically loaded data?
A: 80% is xpath is not written correctly, try contains() with contains(@style,'loading') to do wait judgment. ipipgo'sS5 AgentSupports direct integration to Puppeteer, rendering and then crawling is solid.
One last piece of cold knowledge: ipipgo'sData Center AgentsRecently upgraded TCP handshake optimization, when crawling pages containing a lot of contains() queries, the response speed is 2.3 times faster than the regular proxy. New user registration lose promo codeXPath666If you can buy a premium package for free for three days, it's really a loss if you don't pull the wool over your eyes.

