
How does the contains thing work in XPath?
Brothers engaged in web crawling must have seen this situation: there is a button on the page can not be found, take a closer look to find its class name with a random string. At this pointcontains() functionJust a lifesaver, this one specializes in those positioning puzzles with changing elements.
As a solid example, the price element of an e-commerce site looks like this:
<div class="price_abc123">¥299</div>
You can't catch it at all with regular XPath, so it's time to pull out the contains trick:
//div[contains(@class, 'price_')]
How are proxy IPs and XPath related?
What's the biggest fear of using proxy IPs for data collection?It's a good idea to be recognized by the website!Some sites specifically focus on the positioning characteristics of the XPath, if you find that you always use a fixed path to capture data, minutes to block the IP. ipipgo's dynamic IP pool will come in handy at this time, each request for a different exit IP, with the flexibility of the contains positioning, the collection of the success rate is directly doubled.
For example, if you want to catch the price of gas in various regions of the country, the structure of the webpage may be fine-tuned in different provinces:
//span[contains(text(), 'Gasoline 92')]/following-sibling::div
This time to hang ipipgo quality proxy, both to ensure accurate positioning, but also to avoid triggering the anti-climbing mechanism.
Three major pitfalls in the real world
1. Text content with spaces: Some websites have hidden spaces before and after the text, remember to add normalize-space to deal with it:
//[contains(normalize-space(), 'login')]
2. Mixed Chinese and English: When it comes to mixed text like "Submit", it is recommended to use pipeline characters to do multi-matching:
//button[contains(text(), 'Submit') or contains(text(), 'Submit')]
3. Dynamically loaded content: In this case remember to work with ipipgo'shigh speed node, set a reasonable timeout to avoid positioning failures due to loading delays.
Optimization Tips Only Old Drivers Know
- Mostly use combinatorial conditions:contains()Used in conjunction with other attributes for greater accuracy
- Prioritize visible text: addnot(contains(@style,'display:none'))Filtering hidden elements
- Change your positioning strategy regularly: just like changing proxy IPs, don't let websites figure out your routine
Frequently Asked Questions QA
Q:What should I do if the contains match is always wrong?
A: Try usingtranslate()functions are case-insensitive, or instead use a combination of fuzzy matches, such as matching both text and neighboring element features
Q: Why is it still recognized after using ipipgo?
A: Check three points: 1. whether the request header is randomly switched 2. whether the XPath is too fixed 3. whether the access frequency is reasonable. It is recommended to turn on ipipgo's automatic rotation mode + random delay settings
Q: Is there an alternative to contains?
A: You can trystarts-withmaybeends-withThe key is to work with a quality proxy IP, like ipipgo, which supports thesession holdservices that can effectively maintain collection stability
Why do you recommend ipipgo?
Having tested and compared a number of service providers on the market, ipipgo wins in three key metrics:
1. IP survival time up to 6-12 hours (2-3 hours common in other homes)
2. National coverage of 300+ city nodes (industry average 50+)
3. Automatic de-duplication mechanism ensures that each time a new IP is obtained
Especially for projects that do long term data monitoring, use theirexclusive IP poolWith intelligent XPath positioning, continuous collection for 30 days without dropping. New user registration also sends 5G traffic package, enough to test small and medium-sized projects.
Lastly: XPath positioning and IP proxy are like chopsticks brothers, single use of which head can not eat hot rice. Contains the function to play smooth, and then with a reliable ipipgo proxy, data collection of this matter is halfway there. The rest is more practice and more adjustments, there are any specific problems welcome to the official website to find technical customer service nagging.

