IPIPGO ip proxy Advanced Usage of XPath contains() Function

Advanced Usage of XPath contains() Function

When the proxy IP meets XPath contains () the wonderful reaction to engage in crawling old iron understand, data capture is afraid of encountering dynamic class name and random element id. at this time XPath contains () function is like a late-night snack stall barbecue stick, can string up a variety of fragmented information. However, many people only know to use contain...

Advanced Usage of XPath contains() Function

When the proxy IP meets XPath contains() wonderful reaction

Crawlers know that data crawling is the worst thing you can do.Dynamic class namerespond in singingRandom element idThis is where XPath's contains() function is like a barbecue skewer. At this time XPath's contains() function is like a barbecue skewer at a late-night snack stand, which can string up all sorts of bits and pieces of information. However, many people only know to use contains(text(), 'keyword'), which is like holding a submachine gun as a burning stick to make.

I. Trident Usage in Proxy IP Scenarios

When paired with ipipgo's premium proxies, contains() can play tricks:

take combination of techniques anti-blocking technique
Multi-language website contains(@class,'product')+contains(. ,'$') EU nodes with ipipgo
Price fluctuation monitoring //div[contains(@id,'price_')][contains(. ,'.99′)] Setting up IP rotation for 3 seconds/times
CAPTCHA trap //input[contains(@name,'captcha')]/following-sibling::img Switch Residential Agents Now

Remember to put in the backend of ipipgoIP switching frequencyrespond in singingtimeout and retrySetting it to smart mode is much less of a hassle than doing it manually.

Second, the attribute value fuzzy matching of the soi operation

Many sites will add random suffixes to elements, such as class="btn-submit-5a3b". This is when you can write it like this:

//button[contains(@class,'btn-submit') and contains(@onclick,'submitForm')]

This combo hits, regardless of whether it's followed by Martian or gibberish. Combined with ipipgo'sStatic long-lasting agentsThe same IP will remain unchanged for half an hour and will not trigger the verification, which is measured to be 37% more stable than the dynamic IP.

III. Flash localization under multi-layer nesting

Don't be quick to curse when you come across a nested DOM structure, try this:

//div[contains(@style,'display: block')]//span[contains(@data-bind,'ko.observable') ][contains(. ,'inventory')]

This trick specializes in elements generated by various front-end frameworks. ipipgo'sexclusive IP pool有个隐藏功能——可以绑定特定机房线路,比如专门用圣何塞节点抓北美电商,能压到200ms以内。

IV. The ultimate mystery of the combination of motion and static

Mix and match contains() with axis expressions:

//table[contains(@class,'data-table')]/tbody/tr[position()>1]/td[contains(normalize-space(), ' spot')]/preceding-sibling::td[1]

This writeup allows you to skip right over the table header to grab the spot item, which is much faster than a regular expression. Remember to turn on ipipgo inRequest interval randomizationIf you set the access interval to a random value between 1.8 and 3.2 seconds, the anti-climbing system will not be able to figure out the pattern at all.

QA First Aid Kit

Q: What should I do if I always get my IP blocked by websites?
A: 80% of the proxy quality is not good, ipipgo'sCommercial level agentsComes with UA camouflage and TLS fingerprint obfuscation, new users get 1G traffic test for free.

Q: How can I monitor hundreds of websites at the same time?
A: Use ipipgo'sMulti-Threading PackageIn conjunction with xpath's contains()+starts-with() combo query, remember to set the timeout threshold to 8 seconds.

Q: Can't catch dynamically loaded data?
A: 80% is xpath is not written correctly, try contains() with contains(@style,'loading') to do wait judgment. ipipgo'sS5 AgentSupports direct integration to Puppeteer, rendering and then crawling is solid.

One last piece of cold knowledge: ipipgo'sData Center AgentsRecently upgraded TCP handshake optimization, when crawling pages containing a lot of contains() queries, the response speed is 2.3 times faster than the regular proxy. New user registration lose promo codeXPath666If you can buy a premium package for free for three days, it's really a loss if you don't pull the wool over your eyes.

我们的产品仅支持在境外网络环境下使用(除TikTok专线外),用户使用IPIPGO从事的任何行为均不代表IPIPGO的意志和观点,IPIPGO不承担任何法律责任。

business scenario

Discover more professional services solutions

💡 Click on the button for more details on specialized services

IPIPGO-五一狂欢 IP资源全场特价!

Professional foreign proxy ip service provider-IPIPGO

Contact Us

Contact Us

13260757327

Online Inquiry. QQ chat

E-mail: hai.liu@xiaoxitech.com

Working hours: Monday to Friday, 9:30-18:30, holidays off
Follow WeChat
Follow us on WeChat

Follow us on WeChat

Back to top
en_USEnglish