
Teach you by hand to use PHPcurl + proxy IP to collect data
We do data collection, the most afraid to meet the website anti-climbing mechanism. Last week an e-commerce friend to find me, said he used PHPcurl to write the collection script suddenly failed, the site directly to his IP blocked for three days. It is not difficult to solve this problem, today I will take this case, teach you how to use ipipgo proxy IP service to deal with anti-climbing.
// Basic curl example (this will be blocked sooner or later)
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://目标网站.com");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$output = curl_exec($ch);
Why do I have to use a proxy IP?
A lot of sites are loadedFlow Fingerprint Identification SystemIt's like the security door of a supermarket. You use an IP to access repeatedly, equivalent to the same person half an hour in and out of the supermarket 20 times, the security guards do not stare at you to stare at who? ipipgo's proxy pool have8 million + dynamic IPsThis is equivalent to preparing numerous "vests" for you, so that the site can not distinguish who is who.
| take | No need for an agent. | Proxy with ipipgo |
|---|---|---|
| Number of requests per day | ≤500 times | ≥ 50,000 times |
| probability of IP blocking | 80% and above | <3% |
Real-world makeover: putting IP armor on curl
Take the script that was just blocked and remodel it in three key steps:
// Get the proxy from ipipgo (be careful to replace your own API key)
$proxy = file_get_contents("https://api.ipipgo.com/getproxy?key=你的密钥");
// Configure the curl proxy parameters
curl_setopt($ch, CURLOPT_PROXY, $proxy);
curl_setopt($ch, CURLOPT_PROXYTYPE, CURLPROXY_HTTP); curl_setopt($ch, CURLOPT_PROXYTYPE, CURLPROXY_HTTP)
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
// Important! Remember to add an error retry
if(curl_errno($ch)){
$proxy = file_get_contents("https://api.ipipgo.com/report?proxy=".$proxy); // Report the failed IP.
// Retrieve proxy to continue execution...
}
Watch out for potholes:Don't try to save trouble by writing the proxy IP to death in the code, make sure to use dynamic acquisition. ipipgo's API supports filtering IPs by region and carrier, you can use this function if you do cross-border collection.
Tips for improving collection efficiency
1. Multi-threaded acquisition, each thread should be equipped with an independent agent, do not let multiple requests share the same IP address.
2. Randomize request intervals, don't visit like an alarm clock!
3. Don't fight when you encounter CAPTCHA, change to a new IP through ipipgo and try again.
4. Regularly clear cookies, do not let the site track the behavior of the track
// Random delay script (in seconds)
sleep(rand(1,5) + mt_rand(0,3000)/1000);
Frequently Asked Questions QA
Q: What should I do if my proxy IP suddenly fails?
A: add a reporting mechanism in the curl error callback, ipipgo's system will automatically exclude the problem IP when it receives the feedback
Q: How can I tell if a proxy is in effect?
A: Print curl_getinfo($ch, CURLINFO_PRIMARY_IP) after curl_exec to see if the output IP has changed
Q: How many proxy IPs are needed per day?
A: According to the business volume, generally 200-300 requests per IP per hour is safer. ipipgo's packages range from daily rentals to monthly packages, and new users get 5000 test IPs!
Lastly, I'd like to remind you that you should follow the website robots protocol when doing data collection. Using ipipgo's proxy service is not to do damage, but to make our legal collection smoother. Once I helped a customer to do price comparison system, after using the dynamic agent, the success rate of data acquisition from 47% directly soared to 98%, the effect is immediately visible.

