
Hands-on API automation with curl files
I've been asked by a lot of people doing data crawling lately.How to handle hundreds of API requests at the same time without blocking IPsWhat's that? This thing is not difficult, the key to know a little batch processing skills. Today we will use the most commonly used curl command, with ipipgo proxy service, teach you how to process API requests like a factory assembly line.
Get ready for your stuff.
First you have to have three things on hand:
1. Install the command line environment of curl.(Windows uses PowerShell, Mac opens a terminal directly)
2. List of API addresses prepared in advance(saved as a txt file, one URL per line)
3. Dynamic proxy pool for ipipgo(recommend using their s5 protocol, the one with account password authentication)
Four Steps to Practice
Let's take the weather forecast API as an example. Suppose we want to check the weather of 50 cities in bulk:
Step 1: Build the request file
Create a new weather_apis.txt with content that looks like this:
http://api.weather.com/beijing http://api.weather.com/shanghai ... (other cities)
Step 2: Write the Loop Script
Hit this command in the terminal:
while read url; do curl -x socks5://user:pass@proxy.ipipgo.net:24000 "$url" done < weather_apis.txt
Here, be careful to replace it with the one you got in the ipipgo backend.Real account passwordThe port number also depends on the specific package type.
Step 3: Processing of results
Add an output parameter if you want to save the returned data:
curl -x ... -o "output_$(date +%s).json"
This writeup generates separate files with timestamps for each result to avoid data overwriting.
Step 4: Anomaly Monitoring
Older drivers add an error retry mechanism:
curl --retry 3 --retry-delay 5 ...
It means that it fails to automatically retry 3 times, each time with an interval of 5 seconds, and this works especially well for unstable APIs.
Common pitfalls QA
Q: Why do I still get blocked after using a proxy?
A:Check whether the proxy IP repeated use, ipipgo background can set "each request for IP", turn on this switch!
Q: How do I control the frequency of requests?
A: Add sleep command to the loop, for example, stop for 1 second for every 10 requests:
if (( $count % 10 == 0 )); then sleep 1; fi
Q: What should I do if the returned data is garbled?
A: Add a character encoding parameter to curl:
curl --compressed -H "Accept-Encoding: gzip" ...
Performance Optimization Tips
If you have to handle thousands of requests and a single thread is too slow, you can use thexargs command to open multiple threads::
cat apis.txt | xargs -P 8 -I {} curl -x ... {}
The -P 8 means running 8 threads at the same time, adjust it according to your computer configuration. Remember to raise the "concurrency" quota in the ipipgo console, otherwise you will be limited.
A final reminder.Bulk requests should always be aware of the terms of service of the target websiteDon't hang people's servers. Using ipipgo's rotating IPs not only avoids bans, but their IP pool is updated frequently enough to basically guarantee a new IP for every request.

