IPIPGO ip proxy XPath contains class name: Precision Positioning Element

XPath contains class name: Precision Positioning Element

What is the use of XPath with class names? The old iron engaged in data capture should understand that the elements in the web page is like a chameleon, especially now full of such random class names. This time XPath contains function is a lifesaver, such as //div[contains(@class,'part&#821...

XPath contains class name: Precision Positioning Element

What does XPath with class names really do?

The old iron of data crawling should understand that those elements in the web page are like chameleons, especially nowadays the streets are full of

This random class name. At this pointXPath's contains functionIs a lifesaver, such as //div[contains(@class,'part')] this kind of writing, regardless of the class name followed by what random characters, can be seized.


// As a live example
//div[contains(@class,'product-item')]

How did proxy IP and XPath get together?

With ipipgo's proxy service with XPath to catch the data, it's like giving the crawler wearing a cloak of invisibility. For example, if you want to catch the price of an e-commerce site, people's anti-climbing mechanism found that you frequently visit, directly to your IP shut down the dark room. This time with ipipgo'sDynamic Residential AgentsThe success rate is directly doubled by changing different exit IPs for each request, together with accurate XPath localization.

Here is a real situation: a customer with a fixed IP to capture data, three days to be blocked. After switching to ipipgo's rotating proxy, theTwo weeks of continuous operation with no abnormalities, crawl accuracy spiked from 48% to 92%.

Class name positioning three big pits don't step on

1. Beware of class names with spaces: e.g.

If you want to use the same name, you'd have to write it as contains(@class,'btn') and contains(@class,'active').

2. Dynamically generated class name: like class="ui-component-12345″, this time to grab the fixed part, such as //[contains(@class,'ui-component-')]

3. Multi-matching issues: it is recommended to use developer tools to validate first, do not let XPath match to more than one element

Real-world configuration tutorials

Take the Python + ipipgo agent as an example:


import requests
from lxml import html

proxies = {
    'http': 'http://用户名:密码@gateway.ipipgo.com:9020',
    'https': 'https://用户名:密码@gateway.ipipgo.com:9020'
}

resp = requests.get('target url', proxies=proxies)
tree = html.fromstring(resp.content)
 Here's the key ↓↓
price = tree.xpath('//span[contains(@class, "price-symbol")]/following-sibling::text()')[0]

Five Questions You're Sure to Ask

Q: What should I do if the class name changes every day?
A: look for the law of development, really can't go on ipipgo'sJS Rendering Proxy Service, can handle dynamically loaded content

Q: How do I break the match to more than one element?
A: Add layers of positioning, for example, first find the fixed features of the outer div, and then go inwards

Q: Why are ipipgo's proxies not easily blocked?
A: His family uses a real residential IP pool, each IP has real user behavior characteristics, than the server room IP is not a little bit more reliable!

Q: What if XPath is inefficient?
A: Combined with the use of CSS selectors, key positions and then contains function, ipipgo'sExclusive High Speed ProxyIt's also a speed bump.

Q: What should I do if I encounter a CAPTCHA?
A: ipipgo's proxy IP comes with cookie management function, with the request header randomization, can significantly reduce the verification code trigger rate

Why do you recommend ipipgo?

The actual test data to speak: comparison of three proxy service providers, using the same XPath script to capture a platform data

service provider success rate blocking rate
ipipgo 95% 2%
Company A 78% 15%
Company B 82% 22%

Special mention to his family.class name whitelisting featureIt can preset common class name rules to automatically adapt to different website structures, which is unique among similar products.

我们的产品仅支持在境外网络环境下使用(除TikTok专线外),用户使用IPIPGO从事的任何行为均不代表IPIPGO的意志和观点,IPIPGO不承担任何法律责任。

business scenario

Discover more professional services solutions

💡 Click on the button for more details on specialized services

IPIPGO-五一狂欢 IP资源全场特价!

Professional foreign proxy ip service provider-IPIPGO

Contact Us

Contact Us

13260757327

Online Inquiry. QQ chat

E-mail: hai.liu@xiaoxitech.com

Working hours: Monday to Friday, 9:30-18:30, holidays off
Follow WeChat
Follow us on WeChat

Follow us on WeChat

Back to top
en_USEnglish