IPIPGO ip proxy XPath Include Functions: Text Matching and Positioning Techniques

XPath Include Functions: Text Matching and Positioning Techniques

XPath contains() in the end how to play? Brothers engaged in data collection should understand that web page element positioning is like a needle in a haystack. At this time XPath contains () function is your magnet, especially when the element characteristics are not obvious. To give a chestnut, looking for a page with all the "price&#82...

XPath Include Functions: Text Matching and Positioning Techniques

How does XPath's contains() really work?

Brothers engaged in data collection should understand that web page element positioning is like a needle in a haystack. At this time XPath contains () function is your magnet, especially when the element characteristics are not obvious. For example, to find a page with all the "price" of the word div label, directly written as//div[contains(text(),'price')], much more flexible than matching with full text.


//[contains(@class,'btn_submit')] //find elements that contain the submit button style
//a[contains(@href,'product_detail')] //grab the product detail page link

How do proxy IPs and XPath work together?

Many websites anti-climbing mechanism thieves fine, the same IP frequent visits directly to your black. At this time we have toDynamic Residential Proxy for ipipgoOut of the gate, their IP pool is updated with 8000+ nodes per day. Let's say you want to collect price data from an e-commerce site:


import requests
from lxml import etree

proxies = {
  'http': 'http://user:pass@gateway.ipipgo.com:9021'
}

resp = requests.get('https://xxx.com', proxies=proxies)
html = etree.HTML(resp.text)
prices = html.xpath('//span[contains(@class, "price")]')

A practical guide to avoiding the pit

I've encountered this pitfall: a website that hides the price in thedata-priceIn the attribute, the surface text reads "¥??". in the attribute, the surface text shows "¥? At this point, simply using text() to locate it would be a bust, and you'd have to write it this way:


//div[@id='goods']/@data-price // extract attribute values directly

With ipipgo'sIntelligent Rotation StrategyThe company has set up an automatic IP change every 5 minutes, and the collection success rate has soared directly from 50% to 95%. They can also see the use status of each IP in the background, which is really worry-free.

I'm sure you want to ask these.

Q: Is contains() case sensitive?
A: points! To find "PRICE" you need to write 'PRICE', we suggest using the translate() function first to convert to lowercase!

Q: How do I break dynamically loaded content?
A: Use ipipgo's firstHigh Stash AgentsBypass the backcrawl, and with a tool like Selenium, wait for the element to finish loading before grabbing the

Q: Does ipipgo survive long enough?
A: The actual test of their single IP can be used for 10-30 minutes, do regular collection is completely enough. If it is a long-term task, it is recommended to open API to extract new IP automatically.

Why ipipgo?

Having compared several proxy providers, ipipgo has three hardcore advantages:

functionality General Agent ipipgo
IP Type server room IP-based Real Residential IP
concurrency 50 threads limitless
geographic location Fixed cities Select base station location on demand

I was helping a client do a comparison capture last week and used their homeShanghai Local IP访问目标网站,居然比普通代理快3倍。后来才知道他们和三大运营商有通道,这波属实专业。

The Ultimate Combo

Finally, I'll share a private configuration plan:

  1. Created in the ipipgo consolepersistent sessionact on behalf of sb. in a responsible position
  2. XPath is written as//[contains(@id,'result_')]Matching Dynamic IDs
  3. Setup failure retry 3 times + automatic IP switching

This set of combinations tested daily average collection of 100,000 pieces of data without jamming. Especially for those who do cross-border e-commerce, use theirOverseas Native IPWith XPath positioning, catching competitor data is a sure thing.

我们的产品仅支持在境外网络环境下使用(除TikTok专线外),用户使用IPIPGO从事的任何行为均不代表IPIPGO的意志和观点,IPIPGO不承担任何法律责任。

business scenario

Discover more professional services solutions

💡 Click on the button for more details on specialized services

美国长效动态住宅ip资源上新!

Professional foreign proxy ip service provider-IPIPGO

Contact Us

Contact Us

13260757327

Online Inquiry. QQ chat

E-mail: hai.liu@xiaoxitech.com

Working hours: Monday to Friday, 9:30-18:30, holidays off
Follow WeChat
Follow us on WeChat

Follow us on WeChat

Back to top
en_USEnglish