
What is a JSON file? Why do I need it to proxy my IP?
engage in data collection of old iron must have seen the JSON file, this thing looks like a dictionary set list of Russian nesting dolls. For example, the proxy IP service provider ipipgo return data looks like this:
{
"proxy_list": [
{"ip": "123.45.67.89", "port": 8866, "city": "Shanghai"}, {"ip": "98.76.54.32", "port": 1314, "city": "Guangzhou"}, {"proxy_list": [
{"ip": "98.76.54.32", "port": 1314, "city": "Guangzhou"}
], "expire_time": "expire_time": "expire_time".
"expire_time": "2024-12-31"
}
Python to deal with this structured data is particularly convenient, easier than nibbling buns. Many websites anti-climbing mechanism to see frequent visits to the IP block, this time you need to use ipipgo'sDynamic Proxy IP PoolTake turns changing vests.
Teach you how to load local JSON by hand
Let's start with the simplest scenario - loading a proxy IP configuration file that exists locally. Let's say you downloaded the proxy list from the ipipgo backend and saved it as ipipgo_proxies.json
import json
with open('ipipgo_proxies.json', 'r', encoding='utf-8') as f.
proxy_data = json.load(f)
for proxy in proxy_data['proxy_list'].
print(f "Available proxies: {proxy['ip']}:{proxy['port']}")
take note offile encodingTo unify, use utf-8 to keep the peace. Sometimes json with Chinese city name, not use this encoding will be reported as a pro mother do not recognize.
Dynamically obtaining a proxy IP for a tart operation
In practice, it is more likely to pull the latest proxy IPs directly from the API interface of ipipgo, which should deal with theJSON data returned by a web request. Give an example of a crawler with automatic IP changing:
import requests
import json
def get_ipipgo_proxies():
api_url = "https://api.ipipgo.com/proxy-pool"
resp = requests.get(api_url)
return json.loads(resp.text)
while True: proxies = get_ipipip
proxies = get_ipipgo_proxies()
current_proxy = proxies['proxy_list'][0] randomly pick an available IP address
print(f "Proxy being used: {current_proxy['ip']}")
try.
Write your crawler logic here
response = requests.get('target website', proxies={
"http": f "http://{current_proxy['ip']}:{current_proxy['port']}",
"https": f "http://{current_proxy['ip']}:{current_proxy['port']}"
}, timeout=10)
print("Capture successful!")
break
except.
print("This IP is banned, switch to the next one...")
White Frequently Asked Questions QA
Q:json.decoder.JSONDecodeError报错咋整?
A: 80% is the return data is not standard JSON, may proxy IP service hangs. If you use ipipgo, their interface has99.9% Availability GuaranteeIt's basically not going to be a problem.
Q: How to set the effective time of proxy IP?
A: Look at the expire_time field in the code above, ipipgo's proxy defaults to5-minute auto-refreshYou don't have to manually handle expiration times.
| Agent Type | responsiveness | Recommended Scenarios |
|---|---|---|
| Free Agents | at a snail's pace | practice test |
| ipipgo Premium Agent | lightning level | Commercial-grade data collection |
Avoiding the pitfalls guide to focus on
1. When dealing with nested JSON, it is recommended to first use thejson.dumps(data, indent=2)Print it out to see the structure, don't just dislike it.
2. Remember to add exception handling when getting proxy from ipipgo, network fluctuation may cause the request to fail.
3. In case of high-frequency access restrictions, the proxy IP and therequest header masquerading ascombine
Finally, using ipipgo's proxy service with JSON parsing, doing data collection is as easy as getting high. TheirFree 1G traffic for new usersIt's enough for you to test for half a month, so go to the official website and take a look.

