
Hands-on teaching you to convert proxy IP data to CSV file
The old iron engaged in data collection know that the proxy IP is used up to store and analyze. But a lot of tools exported out of the format of the mess, today teach you to use Python whole job, the proxy IP data packaged in a clear, direct transfer CSV form to take away.
Prepare the kit before collection
It's important to have a proxy IP service on hand, here are some recommendationsipipgo's Dynamic Residential (Standard) Package, 7 dollars more than 1 G traffic enough not expensive. Their API call is particularly simple, get the data long like this:
{
"ip": "123.123.123.123",
"port": 8888,
"expire_time": "2024-01-01 12:00",
"location": "United States Texas"
}
Pay attention to see if the fields are complete, some service providers give the data missing arms and legs, later processing to be crazy.
Three Steps to Real-World Acquisition
Let's write a simple script in Python, and remember to install therequestsrespond in singingpandasThese two libraries:
import requests
import pandas as pd
Interface to get data from ipipgo (change the real API yourself)
api_url = "https://api.ipipgo.com/get_proxy"
resp = requests.get(api_url)
raw_data = resp.json()
Highlights! Flatten the data and organize it
clean_data = []
for item in raw_data['proxies'].
clean_data.append({
'ip address': item['ip'],
'port number': str(item['port']), convert string to error-proof
'expiration_time': item['expire_time'],
'location': item['location'].split()[0] as long as country
})
Time for the magic trick
df = pd.DataFrame(clean_data)
df.to_csv('Proxy IP List.csv', index=False, encoding='utf-8-sig')
After running the script the current directory pops upProxy IP List.csv, open it in Excel and it looks like this:
| IP address | port number | expiration date (of document) | location |
|---|---|---|---|
| 123.123.123.123 | 8888 | 2024-01-01 12:00 | United States of America |
Avoiding the pitfalls guide to focus on
Pit Point 1:In the case of a nested dictionary in the data, you have to use the json_normalize function to expand it, don't just do it!
Pit Point 2:If csv opens with garbled code, change the encoding parameter to utf-8-sig.
Pit Point 3:ipipgo's static residential IPs have a long validity period, which is suitable for business scenarios that require long-term monitoring.
Frequently Asked Questions
Q:How come the exported CSV is missing a few columns of data?
A: Check whether the API return field and the dictionary key in the code correspond exactly, it is recommended to use print output to see the original data format first.
Q: What packages are cost-effective for enterprise-level acquisition needs?
A: Data-heavy direct onipipgo Dynamic Residential (Business) PackageI'm not sure how much I'm going to pay for it, but it's more than $9 for 1G of traffic with request prioritization.
Q: What should I do if my code reports an SSL certificate error?
A: Add verify=False to requests.get, but this is not recommended for formal environments.
Why ipipgo?
Real life experience of using it in your own home:
1. I was shocked that someone replied to my work order at 3:00 a.m.
2. There was a request for an IP address for a small, cold country, and the customer service really took care of it.
3. It's very humanized that you won't be disconnected if you use too much traffic.
4. Different services can be mixed and matched packages, without being bundled consumption
As a final rant, remember to clean the data with thepandas drop_duplicates()De-weighting, don't let duplicate IPs waste resources. Although it is simple to turn CSV, but the details in place can save a lot of follow-up trouble, especially for cross-border e-commerce friends, choose the right proxy IP service provider can really double the efficiency of the crawler.

