
Hands-on with XPath to pickpocket proxy IPs
Brothers engaged in data capture understand that XPath this thing is like a mining shovel, can not accurately dig to the desired data all depends on whether it will make. Today we will nag how to use XPath to locate the proxy information in the web page, and incidentally, how to use ipipgo's proxy service to make this matter more smooth.
I. XPath positioning mnemonics
It's good to remember this mnemonic:"Keep an eye on tag attributes, don't walk away from text content.". For example, to grab the IP address in this HTML:
192.168.1.1:8080
10.0.0.2:8888
Use this XPath to get it all in one place://div[@class='proxy-list']/span/text().. Focus on the class attribute value to target the text content directly.
Second, the agent set up anti-blocking tips
Straight to the hard goods configuration template (Python example):
import requests
from lxml import etree
proxies = {
'http': 'http://用户名:密码@gateway.ipipgo.com:端口',
'https': 'http://用户名:密码@gateway.ipipgo.com:端口'
}
response = requests.get('destination URL', proxies=proxies)
html = etree.HTML(response.text)
ip_list = html.xpath('// your XPath expression')
Highlight it three times:Be sure to use a dynamic residential IP! With static IP minutes to be pulled by the site, ipipgo's dynamic residential package of more than 7 yuan 1G can be used for a long time, cheaper than buying milk tea.
III. Guidelines for clearing common pitfalls
| symptomatic | antidote |
|---|---|
| XPath is not positioned correctly | Copying XPath with Browser Developer Tools |
| The agent can't connect. | Check if the whitelist is bound to a local IP |
| slow crawl | Switching ipipgo's TK Dedicated Packages |
Fourth, the package selection doorway
ipipgo The difference between the three packages has to be straightened out:
- Dynamic residential (standard): suitable for novice practitioners, $7.67/G cabbage price
- Dynamic Residential (Business): with exclusive access, a must for big projects.
- Static homesThis is the one for account development, 35 bucks for a month on a fixed IP.
QA First Aid Kit
Q: What should I do if XPath fetches an empty list?
A: Eighty percent of the web page structure has changed, with contains function fuzzy match, such as//div[contains(@class,'proxy')]
Q: Proxy IP just used and blocked?
A: change ipipgo's cross-border line, their IP pool is updated more than 200,000 per day, more diligent than changing socks.
Q: What should I do if I need to open more than one crawler at the same time?
A: Create multiple API links in the ipipgo backend, each crawler takes a separate channel, don't woolgather with one sheep.
The last chatter: XPath positioning is not metaphysics, try a few more times to feel the way. Agent this piece of direct copy homework with ipipgo on the line, their socks5 protocol support is really fragrant, configured to play like. What do not understand directly to their technical customer service, reply speed faster than a takeaway boy.

