
Teach you to use a proxy IP to capture data
Recently, I've been asked why I keep getting blocked for capturing data on my own computer. This is something I've done three years ago. At that time, I was doing price monitoring for e-commerce, and after three consecutive days of monitoring, my IP was directly blacklisted. Later, I found that the proxy IP rotation can be a perfect solution, and today I'll talk to you about how to do it.
What is a proxy IP? Why use it?
Simply put, proxy IPs are likecloak of invisibilityThe first thing you need to do is to make sure that the website doesn't look like it's real. For example, if your local IP is 123.45.67.89 and you use a proxy, it will become the IP of the proxy server, which has two advantages:
1. Avoiding blocking: When the website finds abnormal access, the proxy IP is blocked instead of your real IP.
2. Breaking through access restrictions: Some sites are open to certain regions and can be accessed normally with local proxies
Curl Proxy Command Basics
Let's start with the most basic proxy setup format, here we use ouripipgoAn example of a proxy service:
curl -x http://username:password@proxy.ipipgo.com:8000 http://target.com
Note a few key points here:
- Proxy type should be written correctly (http/https)
- Don't put special symbols in your username and password.
- The port number depends on what the service provider gives you (ipipgo commonly uses ports 8000-9000)
Demonstration of real-world capture cases
Let's take crawling e-commerce product information as an example, assuming that we want to crawl 100 pages in a row:
for i in {1..100}
do
curl -x http://user2024:Pass2024@proxy.ipipgo.com:$((8000 + $i % 50))
-H "User-Agent: Mozilla/5.0" -"" -o product_$i.html
"https://mall.com/product/$i" -o product_$i.html
sleep 3
done
There are 3 essences to this script:
1. Port rotation with $ ((8000 + $i % 50)) (ipipgo supports 50 concurrent ports)
2. Added browser UA header for more realism
3. 3 seconds between each request to avoid triggering the anti-climbing mechanism
Guidelines for demining common pitfalls
| error message (computing) | method settle an issue |
|---|---|
| 407 Proxy Authentication Required | Check your username and password, we recommend using ipipgo's key generator tool. |
| SSL certificate problem | Add the -k parameter to skip certificate validation |
| Connection timed out | Change ipipgo's alternate server node |
question-and-answer session
Q: What can I do about slow proxy IPs?
A: It is important to choose a quality service provider, like ipipgo's exclusive line can reach 50M bandwidth. Also note:
- Try to use the same geographical agent (domestic agent for domestic sites)
- Reduced SSL encryption overhead (no https proxy unless necessary)
Q: Do I need to change my IP frequently?
A: Look at the target site's anti-crawl strategy. General advice:
- General site: 5-10 minutes to change
- Strictly anti-crawler: change per request (ipipgo support on demand)
Q: How do I check if the proxy is in effect?
A: First use this command to check the local IP:
curl https://ip.ipipgo.com/myip
Hang the proxy again to execute the same command, and compare whether the displayed IP has changed or not
Upgrade Play Tips
You can combine these tips if you want to be more stealthy:
- Random request interval (sleep $((RANDOM%5+1)))
- Mix of data center IP and residential IP (ipipgo both types)
- Dynamic modification of request headers (with the fake-useragent library)
A final reminder to my novice friends.ipipgoRecently new users send 1G traffic, enough to practice with. Encounter technical problems directly to their customer service, the response speed is much faster than peers. Remember not to use free agents, I tested before, 8 out of 10 are invalid, not to mention the delay may also leak data.

