
Hands-on with XPath's contains function to find web elements
Brothers engaged in data collection understand that XPath expression is like a searchlight, can accurately locate the elements in the web page. However, many newbies are alwaysDynamic Texton the fall - such as the price of goods show "¥ 199.00 ″ and "¥ 199 ″ the format of this difference, this time you have to move out of contains () this magic weapon.
Why do I need to use proxy IPs with XPath?
For example, you write a perfect XPath expression: //div[contains(@class,'price')], and it suddenly fails after a dozen consecutive visits to a certain website. It's most likely not a problem with your code, but rather the target siteBlocked your local IP.! This is where a professional proxy service like ipipgo is needed to automatically switch residential IPs so that the collection task is not disconnected.
| take | prescription |
|---|---|
| Single IP High Frequency Access | ipipgo Dynamic Rotation IP Pool |
| Need to locate dynamic class | contains(class,'fixed field') |
| Anti-Crawl Mechanism Trigger | Proxy IP + request header masquerading |
Contains function practical skills
Remember these three common combos:
- // tags [contains(text(), "keyword")] → find tags that contain specific text
- //[contains(@attribute,'fixed part')] → match elements with dynamically changing attribute values
- contains+starts-with combination → handle class names with random suffixes
Let's say we want to capture the evaluation of an e-commerce platform, and find that the div of each evaluation block has a randomly generated ID, but all contain the prefix "review-", which can be written at this time:
//div[contains(@id,'review-')]/p
ipipgo proxy service configuration
Configure the proxy in Python's requests library (remember to replace the account password in the example with your own credentials obtained from the ipipgo backend):
proxies = {
'http': 'http://用户名:密码@gateway.ipipgo.com:端口',
'https': 'http://用户名:密码@gateway.ipipgo.com:端口'
}
response = requests.get(url, proxies=proxies, timeout=10)
Here's one.Guide to avoiding the pitThe free proxies of many brothers waste a lot of time debugging, the results of the collection of efficiency is reduced. ipipgo's exclusive IP pool supports automatic authentication, the actual test in the continuous 12 hours of the collection of tasks, the IP availability rate remains at 98% or more.
Frequently Asked Questions QA
Q: What should I do if I write the right XPath but can't capture the data?
A: First check if it triggers anti-climbing, use ipipgo to switch IP and retry. It is also recommended to add contains(@class,'xxx') in XPath to do secondary filtering
Q: Do I need to change the proxy IP frequently?
A: Depends on the strength of the wind control of the target website. It is recommended to set "Smart Switching" mode in ipipgo background, the system will automatically switch IPs according to the response state
Q: How do I test if the proxy is working?
A: You can first visit http://httpbin.org/ip查看当前出口IP and compare the assigned IPs shown on the ipipgo console to see if they are the same
Upgrade Play: Intelligent Fault Tolerance Mechanism
Add a double insurance in the code: when contains locate fails, automatically try to locate it with other attributes, and at the same time change the IP in real time through ipipgo's API. give a pseudo code logic here:
try.
element = find(//div[contains(@id,'content')])
except: element = find(//div[contains(@class,'main-text')])
element = find(//div[contains(@class,'main-text')])
ipipgo.rotate_ip() calls the IP change interface
Lastly, for those of you who use ipipgo, remember to turn it on in the background!"XPath mode"Exclusively optimized lines, this feature is specially designed for scenarios that require precise positioning of elements, and can automatically bypass common anti-crawling strategies. New users register to receive a 3G traffic trial, enough to run through the entire collection process.

