
Hands-on teaching you to use PHP to capture data without blocking the number of
Crawler friends understand that the website anti-climbing mechanism is getting more and more strict. Last week my colleague used PHP to write a collection script, the results just run half an hour IP was blocked. At this time it is necessary toProxy IP RotationThis is a godsend, today we talk about how to use ipipgo's proxy service to the PHP script to continue life.
// Basic proxy settings
$proxy = '123.123.123.123:8888';
$context = stream_context_create([
'http' => [
'proxy' => "tcp://$proxy",
'request_fulluri' => true
]
]);
$content = file_get_contents('destination url', false, $context);
Load PHP crawlers with smart IP switching functionality
It's not enough to simply set up a fixed proxy, you have to get aDynamic IP Pool. Here we recommend using ipipgo's API to get a huge number of proxies, and their IP survival rate can reach more than 95%. The specific operation is divided into three steps:
- Sign up for an ipipgo account to receive 500 test IPs
- Call their API to get the latest list of proxies
- Randomly select an IP for each request
// Get ipipgo proxy pool example
$api_url = "https://api.ipipgo.com/get?format=json&key=你的密钥";
$ip_list = json_decode(file_get_contents($api_url), true);
// Pick a random proxy
$rand_proxy = $ip_list['data'][array_rand($ip_list['data'])];
What to do if you encounter a captcha? Try this.
Even if you use a proxy, some sites will still come up with a captcha. This is the time toControl of access frequency, recommendations:
| Type of website | Recommended interval | concurrency |
|---|---|---|
| general information station | 3-5 seconds | Five. |
| E-commerce platform | 10-15 seconds | 2 |
| social media | 20-30 seconds | 1 |
In conjunction with ipipgo'svolumetric billingpackage, you can set up automatic IP switching policy. Tested, their response speed is about 40% faster than ordinary proxies, and the success rate of processing CAPTCHA can be improved a lot.
Frequently Asked Questions First Aid Kit
Q: What should I do if my proxy IP suddenly fails?
A: It is recommended to use ipipgo's intelligent detection function, their API returns IPs with survival time markers, ping it before use
Q: How can I break the slow crawl speed?
A: Check the location of the proxy server, select the node in the region where the target website is located. ipipgo has more than 30 country nodes to choose from, remember to choose the geographically proximate
Q: HTTPS web proxy failure?
A: Add ssl configuration in stream_context, or change to Curl method:
$ch = curl_init();
curl_setopt($ch, CURLOPT_PROXY, $proxy);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
Upgraded solution: automatic maintenance of IP pools
For long-running crawlers, it is recommended to make aIP health check mechanism. Use ipipgo's API with a timed task to update the IP pool every hour. Sharing a self-hosted script logic here:
- Pull new IP list every 60 minutes
- Rejecting agents that respond with a timeout
- Record the success rate of each IP
- Prioritize the use of high success rate IPs
So get down, we have a project ran for 7 days without being blocked, ipipgo's stability is really top. Now they send new users 500 IP trial, engaged in crawlers can go to try.

