
Hands-On JSON Data Processing for Proxy IPs with Python
Friends engaged in network crawlers must have encountered this situation: it is difficult to find a proxy IP service provider, the result of the return data format is a mess. At this time we have to rely on JSON parsing method, especially with Python this magic tool to deal with, can definitely let you lose a few hair.
JSON Basics Without Getting Lost
As a chestnut, suppose you get data like this from the ipipgo API:
{
"proxy_list": [
{ "ip": "192.168.1.1", "port":8080, "type": "socks5"}, { "ip": "10.0.0.2", "port":3128, "type": "socks5"}, [
{"ip": "10.0.0.2", "port":3128, "type": "http"}
]
}
It's easy to disassemble with Python's own json library:
import json
raw_data = 'the JSON string above'
parsed = json.loads(raw_data)
for proxy in parsed['proxy_list'].
print(f "available proxies: {proxy['ip']}:{proxy['port']}")
Proxy IP real battle set
Here's the kicker! When using the requests library with proxies, many people get stuck on the parameter format:
import requests
proxies = {
"http": "http://用户:密码@ip:port",
"https": "http://用户:密码@ip:port"
}
Example of TK leased line proxy with ipipgo
resp = requests.get('target site', proxies=proxies, timeout=10)
Special reminder:If you encounter SSL certificate errors, add averify=FalseParameter temporary solution, but the official environment remember to match the certificate.
How to choose a ipipgo package
There's actually a trick to choosing their packages:
- Doing Data Collection SelectionDynamic residential (standard)The $7+ 1G traffic is affordable enough.
- Enterprise level business directly onDynamic Residential (Business)Package, stability is more top
- Fixed IP optionStatic homes35 bucks a month, no two ways about it.
Guidelines for demining common pitfalls
Q:What should I do if I report KeyError when parsing JSON?
A: Eighty percent of the field names are written incorrectly, first use theprint(parsed.keys())Look at the data structure
Q: What should I do if I can't connect to the proxy IP?
A: Check the whitelist settings first. ipipgo's API will take 3-5 minutes to take effect after extraction.
Q: How to switch between multiple agents automatically?
A: Use a loop structure + random selection to make a pool of ipipgo's proxy list in turn
Advanced Tips and Tricks
Try this performance optimization solution when dealing with a large number of agents:
from multiprocessing import Pool
def check_proxy(proxy):
Logic for checking the availability of a proxy
pass
if __name__ == '__main__'.
with Pool(4) as p.
results = p.map(check_proxy, proxy_list)
Detecting agent survival status with multiple processes is more than a notch faster than single threading. Remember to set up automatic replenishment in the ipipgo backend to ensure that the agent pool is always full.
Finally, a cold one: ipipgo's cross-border line supportsocks5 protocol, which is more stable than the http protocol in some special scenarios. When encountering frequent CAPTCHA, you may want to change the protocol type to try, there may be a surprise.

