
Hands on teaching you to dig for treasure from proxy IP data
The old iron engaged in data collection should understand that the JSON data returned by the proxy IP is like unpacking the courier, the key is to know how to open the box correctly. Today we will use ipipgo's API return data as an example, to teach a few peopleultra-practicalof dictionary manipulation techniques.
Basic version: single-layer data handful
Suppose we get response data like this from ipipgo:
{
"proxy_list": [
{ "ip": "202.123.45.6", "port": 8866, "expire_time": "2024-03-20"}, }
{"ip": "203.88.102.33", "port": 5432, "expire_time": "2024-03-21"}
]
}
To get the port number of the first agent, a newbie might write this:
port = data['proxy_list'][0]['port']
But older drivers add a bumper:
port = data.get('proxy_list', [{}])[0].get('port', 8080)
It's a good defense.KeyErrorrespond in singingIndexErrorTwo big potholes that work especially well when dealing with dynamically changing agent pools.
Advanced Play: Multi-Layer Nested Demolition Technique
Come across this proxy data with geographic information:
{
"node": {
"location": {
"city_code": "SH",
"isp": "telecom"
},
"ip_address": "203.88.102.33:8866"
}
}
expense or outlaychained getSteadiest:
city = data.get('node', {}).get('location', {}).get('city_code')
It's much more refreshing than writing if judgments one layer at a time, especially when dealing with geographically labeled proxies like ipipgo, which can quickly locate resources in a specific region.
Dynamic Key Name Handling Tips
When encountering a situation where you are not sure of the field name, for example:
{
"proxy_2024": {
"daily_quota": 5000
}
}
It is possible to usedictionary traversalto find the target:
for key in data: if key.startswith('proxy')
if key.startswith('proxy'):: if key.startswith('proxy').
print(f "Today's remaining quota: {data[key]['daily_quota']}")
This trick works well when dealing with different versions of API responses, especially for services like ipipgo that update their interfaces regularly.
Practical QA Triple Strike
Q: What should I do if I always get an error when fetching data?
A: 80% is not doing exception handling, it is recommended to use try-except to wrap the fetch operation, or use .get() with default value
Q: What should I do if the proxy IP list changes frequently?
A: ipipgo's API returns the latest available agent every time, it is recommended to use loop traversal instead of fixed index, for example:
for proxy in data.get('proxy_list', []):
print(f"{proxy['ip']}:{proxy['port']}")
Q:What should I do if I want to get more than one field at the same time?
A: It's easiest to unpack with a dictionary:
{ip: port for item in data['proxy_list'] for ip, port in item.items()}
Guide to avoiding the pit
1. Pay attention to the time zone conversion when dealing with the time field, ipipgo's data use UTC time by default.
2. Keep an eye on the case of field names, such asexpireTimerespond in singingexpire_timeDon't get confused.
3. json.dumps() to do data persistence, remember to set theensure_ascii=Falseanti-spam
Finally, we are pleased to introduce our ipipgo service, which specializes in solving various IP problems in data collection. New user registration is free!5GB Traffic PackThe data format support for a variety of data output, with these tips taught today, to ensure that you play the proxy IP data as simple as drinking water!

