
Teach you to use PHP to play with web crawling!
Crawler most afraid of what? Just grabbed two pages on the blocked IP! Today we will teach you to use CURL + proxy IP golden combination, to ensure that you collect data as stable as the old dog. Let's take ipipgo's proxy service as an example, after all, their dynamic proxy pool is really fragrant.
Don't be blind to installing CURL extensions
Now PHP basically comes with CURL, but it is not guaranteed that there is a leak. Open your php.ini file and look for this line:;extension=curlJust delete the semicolon in front of it. Can't get it to work? Go straight to the server administrator and slap the table!
// Check if CURL is available
if (!function_exists('curl_init')) {
die('Hurry up and install the CURL extension!) ;
}
Four Steps to Basic Collection
Remember this universal template:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "Target URL");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$output = curl_exec($ch);
curl_close($ch);
Watch out for potholes:Remember to add the timeout setting! Otherwise you'll get stuck:
curl_setopt($ch, CURLOPT_TIMEOUT, 15); // flash if not responded in 15 seconds
The right way to open a proxy IP
Go straight to the ipipgo configuration example:
curl_setopt($ch, CURLOPT_PROXY, 'gateway.ipipgo.com:9021');
curl_setopt($ch, CURLOPT_PROXYUSERPWD, 'account:password');
There are three main advantages to their home agent pool:
| Automatic IP switching | New IP per request |
| Success Guarantee | 99% Availability Measurement |
| Multi-protocol support | HTTP/HTTPS/Socks5 through and through! |
Acquisition exception handling triple axe
1. Change the IP address when you get a 403 and use ipipgo's autopolling function.
2. Remember to transcode the garbled data:mb_convert_encoding($data, 'UTF-8')
3. Clean cookies regularly:curl_setopt($ch, CURLOPT_COOKIESESSION, true)
Practical experience in the field
Recently, I helped a customer to catch the price data of e-commerce, and the single IP could not last more than 10 minutes. After switching to ipipgo's proxy pool, the continuous collection of 8 hours without taking a breath. Their API can also be viewed in real time dosage, this point is really worry-free.
Frequently Asked Questions QA
Q: What should I do if the proxy suddenly fails?
A: Use ipipgo's standby node feature to configure two proxy addresses to switch automatically
Q: What should I do if the collection speed slows down?
A: Check whether the delay settings are open, it is recommended to use concurrent acquisition + proxy IP combo punch
Q: How can I tell if a proxy is in effect?
A: Put a debug in the code:curl_getinfo($ch, CURLINFO_PRIMARY_IP)Look at the returned IP
Lastly, a word of advice: don't use free proxies! The last time I tried a free IP, 8 out of 10 were bad, it's better to just buy ipipgo's monthly package for a good deal, and new users get a 30% discount on their first month.

