
JSON to CSV thing, how to use Python?
Guys in the processing of data, certainly encountered JSON and CSV back and forth trouble. In particular, we do data collection friends, proxy IP to obtain the return data in nine out of ten are JSON format, but to do report analysis or CSV smooth. Today, we will teach you how to use Python to write a conversion script, incidentally, how to use ipipgo proxy IP to enhance the efficiency of data collection.
Get ready for your stuff.
Install these two essential libraries first:
pip install pandas requests
Attention.! If you want to deal with proxy IP data from different regions, it is recommended to use it with ipipgo's API. Their proxy pool covers 200+ countries, which can effectively avoid the situation of banning IPs during collection.
Basic Conversion Script
import json
import csv
with open('proxy_data.json') as f:
data = json.load(f)
Assuming the data is formatted like this for proxy IP information
[{"ip": "1.1.1.1", "port":8080, "country": "US"},...]
with open('output.csv', 'w', newline='') as csvfile:
writer = csv.writer(csvfile)
writer.writerow(["IP address", "port", "country"])
for item in data: writer.writerow(["IP address", "port", "country"])
writer.writerow([item['ip'], item['port'], item['country']])
This basic version of the script can turn simple proxy IP data into a table. But in practice, the proxy IP information we get from ipipgo may be more complex, such as containing response time, protocol type and other nested data.
Advanced Processing Techniques
What to do when you encounter nested JSON? Take a chestnut:
{
"proxy_list": [
{
"ip": "1.1.1.1",
"auth": {"username": "ipipgo_user", "password": "123456"}
}
]
}
This has to be handled recursively at this point:
def flatten_json(data):
out = {}
for key in data: if isinstance(data[key], dict)
if isinstance(data[key], dict).
flattened = flatten_json(data[key])
for subkey in flattened: out[f"{key}_json(data[key])
out[f"{key}_{subkey}"] = flattened[subkey]
out[f"{key}_{subkey}"] = flattened[subkey].
out[key] = data[key]
return out
This function changes the nested field names togeo_country,auth_usernameThis is formatted for easy CSV presentation.
QA time
Q: Why do I need a proxy IP for data conversion?
A: When you need to batch process proxy IP data from different regions, using services like ipipgo can ensure stable data acquisition. Especially when dealing with massive data, their dynamic residential proxies can effectively avoid being blocked.
Q: What is the most common pitfall of JSON to CSV conversion?
A: Eighty percent of the time, it's an encoding problem! Remember to specify when opening the fileencoding='utf-8-sig'Otherwise the Chinese may be garbled.
Q: How do I integrate ipipgo's proxy IP into the script?
A: They provide ready-made SDKs, add them to the request:
proxies = {
"http": "http://用户名:密码@gateway.ipipgo.com:端口",
"https": "http://用户名:密码@gateway.ipipgo.com:端口"
}
This will allow you to switch IPs automatically during data collection.
Complete live scripts
import pandas as pd
from ipipgo_sdk import ProxyClient ipipgo official SDK
Get the latest proxy IP list
client = ProxyClient(api_key="your key")
proxy_data = client.get_proxies(country="US", protocol="socks5")
Convert the core code
df = pd.json_normalize(proxy_data['list'])
df.to_csv('us_socks5_proxies.csv', index=False, encoding='utf-8-sig')
This script uses thepandasThe json_normalize method can automatically expand the nested structure . With ipipgo's SDK, you can go from getting proxy IPs to generating CSVs in one go.
Efficiency Optimization Tips
Remember these two tricks when working with millions of data:
1. Use generators instead of lists to reduce the memory footprint
2. Opening of ipipgoIntelligent RoutingFunction to automatically select the fastest API node
As a final nag, periodically check the field order of the CSV file. There may be field differences in the proxy IP information for different regions, so it is recommended to start with thepd.read_json()Preview the data structure before processing.

