
Teach you to use PHP to grab data without blocking IP!
Brothers engaged in data collection understand that the biggest headache is the target site suddenly give you an IP ban. Last month I helped customers to catch the price of an e-commerce platform, just run for two days to receive a 403 warning, this time it is necessary to sacrifice theproxy IPThis is a big killer now.
Basic equipment preparation
First of all, the entire PHP environment can be used to confirm that the curl extension installed. Here is a pitfall to note: some servers do not open the default curl, you have to go to php.ini to remove the extension=curl in front of the semicolon.
if (!function_exists('curl_init')) {
die('Go get the curl extension turned on!) ;
}
Naked Capture Code
Let's see what an unprotected code looks like first:
$url = 'https://target-site.com/data';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$result = curl_exec($ch);
curl_close($ch);
This is not half an hour quasi-blocked, especially when the collection frequency is high. Last week, a buddy with this writing method, half an hour changed 6 server IP, angry directly drop the keyboard.
Put a bulletproof vest on your code.
Here's the kicker! To hook up ipipgo's proxy to curl, the code has to be changed to look like this:
$proxy = 'proxy.ipipgo.com:9021'; // fill in the channel provided by ipipgo here
$auth = 'username:password'; //authentication information generated in the backend
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $targetUrl);
curl_setopt($ch, CURLOPT_PROXY, $proxy); curl_setopt($ch, CURLOPT_PROXY, $proxy);
curl_setopt($ch, CURLOPT_PROXYUSERPWD, $auth); curl_setopt($ch, CURLOPT_PROXYUSERPWD, $auth)
curl_setopt($ch, CURLOPT_TIMEOUT, 15); curl_setopt($ch, CURLOPT_TIMEOUT, 15);
//... Leave the rest of the settings untouched
Note three key points:
1. Proxy address with port number, don't miss it
2. Authentication information is not a website account, it is unique to the ipipgo backend.
3. Set a shorter timeout, 15 seconds is enough for most scenarios.
A practical guide to avoiding the pit
Real-life situation I recently encountered while helping a customer with a deployment:
| symptomatic | cure |
|---|---|
| Return to blank page | Check proxy address for protocol headers (http/https) |
| Frequent timeouts | Switching line areas in the ipipgo console |
| Unstable speed | Enable automatic IP switching, set the interval to 30 seconds. |
Veteran Driver Experience Package
1. collection of large amounts of time, it is recommended to use ipipgo's dynamic residential agent, personally tested daily average of 100,000 requests do not turn over!
2. Don't use free agents for important projects, last time someone was greedy for cheap, the result is to collect all the advertising code.
3. Set the User-Agent to disguise the browser, but do not use too popular, easy to be detected
Frequently Asked Questions QA
Q: What should I do if my proxy IP suddenly fails?
A:Enable "Failover" in the background of ipipgo, the system will switch to a new IP in seconds.
Q: How can I tell if a proxy is in effect?
A: add curl_getinfo($ch, CURLINFO_PRIMARY_IP) to the code to see the actual export IPs
Q: How to handle high concurrent acquisition?
A: Use ipipgo's API to dynamically obtain the proxy pool, assign independent IPs to each thread, and remember to control the frequency of requests.
Finally, a lesson in tears: a certain time did not check the availability of the proxy, resulting in the collection of all the wrong data. Later, I found that ipipgo provides online testing tools, and now I run a testing script before each start, which is much more worrying.

