
Teach you to use PHP to catch web pages without blocking the IP!
Old iron is not often encountered to capture data by the site blocked IP, today we will nag how to use proxy IP to solve this headache. Take our own ipipgo service, hand in hand to teach you how to live in PHP.
Why do I need a proxy IP to capture data?
To give a chestnut, you go to the supermarket to buy snacks, even go ten times to take the same membership card, the cashier must be suspicious. This is also the case with anti-creeper websites.Frequent visits from the same IPThe first thing you need to do is to use a proxy IP, which is the equivalent of changing your membership card every time you go to the supermarket. This is when you have to use a proxy IP, which is equivalent to changing your membership card every time you go to the supermarket.
// Normal request (easily blocked)
$html = file_get_contents('http://目标网站.com');
// Use proxy IP (safe mode)
$context = stream_context_create([
'http' => [
'proxy' => 'tcp://ipipgo-proxy.com:8080',
'request_fulluri' => true
]
]);
$html = file_get_contents('http://目标网站.com', false, $context);
PHP proxy real-world three-piece suite
Here's a list of configurations for the guys to follow:
| artifact | corresponds English -ity, -ism, -ization | Recommended Programs |
|---|---|---|
| IP pool | Provide multiple IP addresses | ipipgo Dynamic Residential Proxy |
| request header masquerading as | Simulate Browser Access | Randomized User-Agent Generation |
| request interval | Avoid high-frequency triggers for wind control | sleep(rand(1,3)) |
Real life example: capturing e-commerce prices
Recently there is a price comparison website friends to find me, said with PHP to capture data is always blocked. Give him a whole ipipgo solution, now running two months of stability. The key code is long like this:
// Get the latest proxy IP from ipipgo
$proxy = json_decode(file_get_contents('https://api.ipipgo.com/getproxy'));
$options = [
CURLOPT_PROXY => $proxy->ip,
CURLOPT_PROXYPORT => $proxy->port,
CURLOPT_TIMEOUT => 30,
CURLOPT_HTTPHEADER => [
'User-Agent: Mozilla/5.0 (Windows NT 10.0) Turnip Head Browser'
]
];
$ch = curl_init();
curl_setopt_array($ch, $options);
$data = curl_exec($ch);
Frequently Asked Questions QA
Q: What should I do if my proxy IP is not working?
A: This is why recommend ipipgo's dynamic IP service, their IP pool automatically change a batch every 5 minutes, much more stable than the roadside stalls.
Q: What if the crawl is too slow?
A: You can try concurrent requests, but you have to control the pace. ipipgo's enterprise version supports multi-threaded dedicated channels, which can increase the speed by more than 3 times.
Q: How do I break the CAPTCHA when I encounter it?
A: This is an advanced protection, we suggest to add automatic identification module in the code, or contact ipipgo's technical support to get a customized solution.
Guide to avoiding the pit
The most common pitfall for newbies isProxy IP quality is not good. Some free proxies look like they work, but in reality 8 out of 10 are broken. I've tested it before, and with ipipgo's commercial-grade proxies the success rate can go up to 98%, while the free proxies are not even good enough for 30%.
One last tip: add aException Retry MechanismIf the request fails, it automatically switches to the next IP to continue trying. If the request fails, automatically change to the next IP to continue to try. ipipgo's API returns a list of IPs with availability ratings, prioritize the ones with high ratings, you can go through a lot less.

