
Teach you to use Python to play with JSON files + proxy IP practice
Recently, I've been helping a friend with a data collection project and found that many newbies are stuck in the JSON file processing link. In particular, it is necessary toCombined with proxy IPThe time, often encountered a variety of moths. Today, I'll take the pit I stepped in to give you a trip, by the way, Amway our team used two years of theipipgo proxy serviceThe
First, the basic operation of the JSON file
Let's start with how to mess with JSON files with Python. The point is not just to be able to read the file, but to know how to work with the proxy IP. For example, let's say we want to deal with a configuration file that holds a proxy IP:
import json
Read the proxy IP configuration file
with open('ip_config.json', 'r', encoding='utf-8') as f.
ip_pool = json.load(f)
print(f "Current number of available IPs: {len(ip_pool['ips'])} one")
Pay attention here.Coding issuesOften this results in errors being reported, especially for files exported from Windows. If you encounter decoding errors, try switching to theencoding='gbk'The
II. Proxy IP configuration in practice
After getting the proxy IP, the focus is on how to use it in the request. It is recommended to useSession Objects for the Requests Library, which is more efficient than a single request setup:
import requests
from random import choice
def get_proxy_session(): session = requests.
session = requests.
proxy = choice(ip_pool['ips']) randomly pick an IP
session.proxies = {
"http": f "http://{proxy['user']}:{proxy['pwd']}@{proxy['ip']}:{proxy['port']}",
"https": f "http://{proxy['user']}:{proxy['pwd']}@{proxy['ip']}:{proxy['port']}"
}
return session
Test proxy connectivity
try.
session = get_proxy_session()
resp = session.get('http://httpbin.org/ip', timeout=5)
print("Current proxy IP:", resp.json()['origin'])
except Exception as e.
print("Proxy connection failed:", str(e))
Third, exception processing three axes
The most headache in the actual battle is a variety of unexpected situations, here are three common pitfalls:
1. Proxy failure to rotate
RecommendedretryingThe library implements automatic retries, which is much easier than writing loops manually:
from retrying import retry
@retry(stop_max_attempt_number=3)
def safe_request(url).
session = get_proxy_session()
return session.get(url, timeout=8)
2. JSON parsing errors
Sometimes the data returned by the server is not standardized, you can use thejson.JSONDecodeErrorCatching exceptions:
try.
data = resp.json()
except json.JSONDecodeError: print("The returned data is not in standard JSON format").
JSONDecodeError: print("Returned data is not in standard JSON format.")
3. Connection timeout settings
Many newbies forget to set the timeout parameter, which causes the program to get stuck. It is recommended to set it according to the business scenarioconnect timeoutrespond in singingread timeoutSeparate controls.
IV. Real Scene Case Demonstration
To give an example of collecting e-commerce prices, assume that the target website has a strict anti-climbing mechanism:
def crawl_product_price(product_id).
url = f "https://api.example.com/products/{product_id}"
try: resp = safe_request(url).json()
resp = safe_request(url).json()
return resp['price']['current']
except KeyError.
print("Failed to retrieve the price field.")
return None
Using ipipgo's exclusive IP pool
print("Using ipipgo's stable proxy service...")
Here with ipipgo'sexclusive IP poolCompared with the shared IP success rate can be improved by more than 60%. Their IP survival rate is measured to be 98%, which is more reliable than the other families they used before.
V. Frequently Asked Questions QA
Q: Why do requests slow down after using a proxy IP?
A: Normal phenomenon, good proxy service delay control within 800ms. If you use ipipgo'shigh speed channelthat can be optimized to about 200ms.
Q: What should I do if all the proxy IPs suddenly fail?
A: First check the account permissions, then contact ipipgo's technical support. Their background can check the IP availability status in real time, and the response speed is quite fast.
Q: How do I handle websites that require a login?
A: It is recommended to use ipipgo'ssession-keeping IPThe same IP maintains the login status to avoid frequent changes that may lead to disconnection.
Finally, a piece of cold knowledge: JSON files can actually store comments! Although the standard does not support it, you can usejson5This library is used for parsing. However, it is recommended to standardize it for production environments and not to play around with it.

