
Hands-on teaching you to play with JSON data with Python
Recently, a lot of friends who do data capture asked me, why use Python to deal with JSON files always stuck? It's just like cooking without all the spices. Today, let's talk about how to make JSON processing smoother with the secret weapon of proxy IP. First of all, let's talk about the scene: for example, you want to get bulk commodity information from a website, the other side of the JSON data returned to hide the baby, but direct hard easy to trigger the anti-climbing, this time you need to proxy IP to play with.
import json
import requests
Here's an example using ipipgo's proxy service
proxy_config = {
"http": "http://username:password@gateway.ipipgo.com:9020",
"https": "http://username:password@gateway.ipipgo.com:9020"
}
response = requests.get('https://api.example.com/products', proxies=proxy_config)
data = json.loads(response.text)
print(data['product_list'][0]['price'])
JSON parsing common pitfalls fact sheet
I'll list a few typical mistakes that newbies make:
| pothole | prescription |
|---|---|
| Coding confusion leads to garbled codes | Set it in advance with response.encoding='utf-8' |
| Nested Dictionary Can't Find North | Setting default values with the .get() method prevents reporting errors |
| Memory explosion when loading large files | Streaming with ijson library instead |
The right way to open a proxy IP
Those of you who have used ipipgo know that there is a wonderful thing about his agent - support forOn-demand switching. For example, when dealing with paged data:
from itertools import cycle
Prepare multiple ipipgo proxy nodes
proxy_pool = cycle([
"http://user:pass@node1.ipipgo.com:9020",
"http://user:pass@node2.ipipgo.com:9020"
])
for page in range(1, 10): current_proxy = next(proxy_pool).
current_proxy = next(proxy_pool)
response = requests.get(f'https://api.example.com?page={page}',
proxies={"http": current_proxy})
interactive question-and-answer session
Q:Why does my JSON parsing always report KeyError?
A: 80% is the field name is written wrong, first use data.keys() to see the real field name. If it is a dynamic field, it is recommended to write it with .get('field name', default value)
Q: Does ipipgo's proxy need to be verified every time?
A: His family supports session persistence, and you can reuse the connection after the first authentication, depending on the package type. Enterprise package with session persistence by default
Q: What can I do to deal with the odd time format returned by the API?
A: Use the parser module of the dateutil library, which is much more flexible than datetime:
from dateutil import parser
timestamp = parser.parse("2023-12-25T08:30:00+08:00")
Upgrade Play: Exception Handling Triple Axe
The difference between a veteran driver and a novice is in exception handling. It is recommended to wrap three layers of try for requests:
try.
resp = requests.get(url, proxies=proxy_config, timeout=10)
resp.raise_for_status()
except requests.exceptions.ProxyError:
This triggers ipipgo's automatic IP change mechanism.
except json.
JSONDecodeError: print("The returned JSON is not proper JSON!")
except KeyError as e.
print(f "Field does not exist: {str(e)}")
One last rant, you have to look at a proxy service like ipipgo with smart routing. He has recently added a newdynamic port mappingfunction, with the API to get the latest proxy list, more reliable than writing a dead IP address. The next time you encounter a JSON parsing jam, remember to check if the IP is restricted first, and change the channel to a different one.

