
When proxy IP meets JSON data, Python plays it right!
Engaged in data collection of old iron know, proxy IP service return data in all probability are JSON format. Today, we do not organize false, directly on the dry goods to say how to use Python to understand the whole thing. Take ipipgo's API response, they return to the proxy IP information structure thief standardized, processing is particularly smooth.
First, the demolition of express-type analysis of proxy IP information
Getting the response data from the proxy IP service provider is like receiving a courier package. Let's take a look at the typical return structure of ipipgo:
{
"status": "success",
"data": [
{
"ip": "123.123.123.123",
"port": 8000, "expire_time": { "expire_time": 00:00
"expire_time": "2024-03-01 12:00:00"
},
{
"ip": "124.124.124.124",
"port": 8001,
"expire_time": "2024-03-01 12:30:00"
}
]
}
To deal with this structure, remember the three steps:Confirmation of status → extraction of data → cyclic processing. Look at this code:
import json
response = requests.get('https://api.ipipgo.com/get_proxies')
result = json.loads(response.text)
for proxy in result['data'].
print(f "Available proxies: {proxy['ip']}:{proxy['port']}")
print(f "expire_time: {proxy['expire_time']}")
else.
print("Not much luck today, try again in a different position")
II. Black technology for dynamic configuration of request parameters
Sometimes it is necessary to dynamically generate request parameters according to different business scenarios. For example, to batch test the availability of proxy IPs, you can play like this:
proxy_list = []
Get 10 proxy IPs from ipipgo
params = {
"count": 10,
"protocol": "http",
"region": "east China"
}
response = requests.get('https://api.ipipgo.com/generate', params=params)
proxies = json.loads(response.text)['proxies']
for p in proxies.
proxy_config = {
"http": f "http://{p['ip']}:{p['port']}",
"https": f "https://{p['ip']}:{p['port']}"
}
proxy_list.append(proxy_config)
This generates a list of proxy configurations that can be thrown directly to requests for rotation, and tested for stability.
Third, abnormal treatment should be like checking the water meter
The easiest way to deal with JSON is to format the data incorrectly. I'll teach you a trick:
try.
data = response.json()
except json.JSONDecodeError as e:: print(f "Parsing error!
print(f "There was a parsing error! Location: row {e.lineno}, column {e.colno}")
print("Suggest checking: 1. whether the response header contains application/json 2. whether there is incomplete data")
Here you can call ipipgo's exception reporting interface
requests.post('https://api.ipipgo.com/error_report', data=response.text)
Handling it this way doesn't crash the program and helps the service provider improve the quality, the best of both worlds.
QA Time: Mine clearance of frequently asked questions
Q:What should I do if the proxy IP I acquired suddenly doesn't work?
A: First look at the expiration time field, ipipgo's proxy defaults to 1 hour refresh. It is recommended to set up a timed task to get a new IP 15 minutes in advance
Q: What should I do if there are strange special characters in the returned JSON?
A: 80% is a coding problem, try handling it like this:
response.encoding = 'utf-8-sig'
data = json.loads(response.text)
Q: What should I do if I need to process data from multiple proxy service providers at the same time?
A: It is recommended to unify the data format, such as applying a conversion layer to the response data of ipipgo:
def format_proxy(data).
return {
"host": data['ip'],
"port": str(data['port']),
"source": "ipipgo"
}
Finally, I would like to give a real suggestion: if you use a proxy service for a long time, you should go directly to ipipgo's package to save your heart. Not only is their interface responsive, but their technical support is reliable, unlike some service providers who go missing. The key is that their IP pool is updated frequently, basically will not encounter a large number of invalid cases.

