
First, why is the website always seal you? First read the anti-climbing routine
The biggest headache for crawlers is that theRequest Frequency Limit. For example, a certain treasure commodity data, 30 consecutive requests were pinched connection. At this time do not rush to smash the keyboard, the site is actually through theIP trackingto identify machine behavior.
Let's take a chestnut: your router has a public IP, just like the delivery address on the courier bill. The web server finds that this address sends 50 requests per minute, and directly determines that it is not a human operation. At this time, even if you add a sleep delay in the code, it may also be banned.
Second, how did the proxy IP become an unlocking device?
The principle is very simple--Multiple people sharing a single IP pool. Assuming a proxy service with ipipgo that randomly switches to a different IP for each request, the website sees access logs like this:
| Request order | Source IP | time interval |
|---|---|---|
| 1 | 221.192.136.12 | 3 seconds. |
| 2 | 120.244.62.18 | 5 seconds. |
| 3 | 183.128.240.66 | 2 seconds. |
This way the server will think it'sMultiple real usersIn access, perfectly bypassing single IP frequency detection. The point is to pick a service provider with a large enough IP pool like ipipgo to avoid reusing the same IP.
Third, the hand to teach you to play in PHP agent
First on the core code, followed by a line-by-line analysis:
$proxy = '221.192.136.12:8080'; //proxy address from ipipgo
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "https://目标网站.com");
curl_setopt($ch, CURLOPT_PROXY, $proxy);
curl_setopt($ch, CURLOPT_TIMEOUT, 15); curl_setopt($ch, CURLOPT_TIMEOUT, 15);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$output = curl_exec($ch);
if(curl_errno($ch)){
echo 'Error code: '.curl_errno($ch).' Recommend changing proxy IP'; }
}
curl_close($ch).
Focused Parameter Description:
- CURLOPT_PROXY must be set to the correct format: IP:Port
- Timeout time is recommended to be within 15 seconds, too long affects efficiency
- Remember to deal with error codes, especially 28 (timeout) and 7 (connection refused)
Fourth, what are the real-world advantages of ipipgo?
After using 7 or 8 proxy services, I finally locked in on ipipgo for these main points:
1. Reliable survival rate - measured 95%+ IP can be connected normally 2. fast enough response - an average delay of 800ms, much better than some of the 3 seconds of motionless 3. exclusive channel - enterprise users can open a separate IP pool 4. Transparent price - not like some platforms that hide hidden charges.
A special shout-out to theirIP warm-up mechanismThe newly added IPs will first be tested for availability through low-frequency requests to avoid triggering the wind control as soon as they come up.
Fifth, the white must see to avoid the pit guide
Q: Proxy IPs are not working when I use them?
A: Normal phenomenon! It is recommended to randomly change the IP for each request, use ipipgo's API to get a dynamic IP pool, just add an array polling in the code.
Q: Setting up a proxy or being blocked?
A: Check three points: 1. request header has no browser characteristics 2. single IP request interval is too short 3. whether to trigger the man-machine authentication
Q: Do free proxies work?
A: Short-term testing is fine, but you should definitely buy a commercial service for formal projects. The availability of free proxies is usually less than 20%, and they also leak data.
VI. Configuration program for high-level players
Share a configuration template for those who have to crawl millions of data every day:
// API interface from ipipgo
$ip_api = 'https://api.ipipgo.com/get?format=json';
function getProxy(){
global $ip_api;
$ips = json_decode(file_get_contents($ip_api),true);
return $ips['proxy_list'][array_rand($ips['proxy_list'])];
}
// Automatically change IPs with each request
for($i=0; $i<1000; $i++){
$proxy = getProxy(); // Here pick up the previous curl.
// Here we pick up where we left off with the curl code
usleep(500000); // 0.5 second interval
}
This program realizesDynamic IP Pool + Random DelayDouble protection, with ipipgo's concurrency package, daily crawling millions of data is not a dream.

