
Hands-on with Proxy IP to handle JSON strings
Recently, a lot of data collection partners asked, with the proxy IP always stuck in the JSON data processing this pass. Today we will nag how to use ipipgo's proxy service, easy to deal with a variety of JSON strings of the tawdry operation.
First, the IP address in the JSON to play this way
Let's take a real scenario: the crawler gets data that looks like this
{
"ip": "192.168.1.1",
"port": "8080",
"expiry": "2024-12-31"
}
At this point use the ipipgo API to directly replace the IP field:
import requests
from ipipgo import get_proxy This is the point!
proxy = get_proxy() Automatically get the latest proxy IPs.
data['ip'] = proxy['ip']
data['port'] = proxy['port']
take note of: ipipgo's API returns the standard JSON format, so you don't have to fiddle with parsing it yourself!
Second, don't panic when you encounter an oddball format
Some sites will spell out the IP and port as"proxy": "1.1.1.1:8888"I'll teach you a trick:
Automatic generation of standard formats with the ipipgo client
from ipipgo import format_proxy
bad_format = "1.1.1.1:8888"
clean_proxy = format_proxy(bad_format) return {'ip':'1.1.1.1','port':'8888'}
Third, dynamic IP should be so changed
Remember this golden combination when you need to change agents frequently:
import random
import json
def refresh_proxy().
proxies = ipipgo.get_batch(10) get 10 IPs at a time
return random.choice(proxiles) Randomly choose one to use
while True: current_proxy = refresh_proxy
current_proxy = refresh_proxy()
Stuff the proxy into your JSON request headers
headers = {'X-Proxy': json.dumps(current_proxy)}
Fourth, avoid the pit guide (white must see)
Common rollover sites:
1. Port number becomes a string ("8080″ instead of 8080)
2. IP field with redundant spaces ("192.168.1.1")
3. Unharmonized format of expiry dates
Use this universal clean function
def clean_proxy_data(raw_json)::
try.
raw_json['port'] = int(raw_json['port'].strip())
raw_json['ip'] = raw_json['ip'].strip()
Automatically convert the time format
raw_json['expiry'] = pd.to_datetime(raw_json['expiry']).strftime('%Y-%m-%d')
except.
return ipipgo.get_fresh_proxy() Problems directly to new IPs
QA time
Q:What should I do if I always encounter JSON parsing errors?
A: Start with the ipipgo offeringsvalidate_proxyInterface pre-testing before plugging in business code
Q: What if I need to handle multiple IP pools at the same time?
A: Use theirDynamic Residential (Enterprise Edition)Package, support for multi-channel concurrent processing, more than 9 yuan 1G traffic enough to run small and medium-sized projects
Q: It was fine in beta, but crashed online?
A: Remember to add the JSON request header with the"X-Proxy-Source": "ipipgo"The server-side special optimization can be triggered
What's the best way to get the best value for your money?
| Business Type | Recommended Packages | monthly cost |
|---|---|---|
| personal test | Dynamic residential (standard) | ≈$15 |
| Enterprise Capture | Dynamic Residential (Business) | ≈$200 |
| Long-term fixed requirements | Static homes | 35RMB/IP |
One last secret: burying the JSON in"retry":3field, ipipgo's API will automatically give you 3 retries, which is not even written in the official documentation!

