IPIPGO ip proxy PHP cURL Web Crawling: Proxy IP to solve request frequency limitation

PHP cURL Web Crawling: Proxy IP to solve request frequency limitation

First, why is the site always closed you? First understand the anti-climbing set of friends engaged in crawling the most headache is the frequency of request restrictions. For example, a treasure commodity data, 30 consecutive requests will be cut off. At this time do not rush to smash the keyboard, the site is actually through the IP tracking to identify the machine behavior. To give a chestnut: your home road...

PHP cURL Web Crawling: Proxy IP to solve request frequency limitation

First, why is the website always seal you? First read the anti-climbing routine

The biggest headache for crawlers is that theRequest Frequency Limit. For example, a certain treasure commodity data, 30 consecutive requests were pinched connection. At this time do not rush to smash the keyboard, the site is actually through theIP trackingto identify machine behavior.

Let's take a chestnut: your router has a public IP, just like the delivery address on the courier bill. The web server finds that this address sends 50 requests per minute, and directly determines that it is not a human operation. At this time, even if you add a sleep delay in the code, it may also be banned.

Second, how did the proxy IP become an unlocking device?

The principle is very simple--Multiple people sharing a single IP pool. Assuming a proxy service with ipipgo that randomly switches to a different IP for each request, the website sees access logs like this:

Request order Source IP time interval
1 221.192.136.12 3 seconds.
2 120.244.62.18 5 seconds.
3 183.128.240.66 2 seconds.

This way the server will think it'sMultiple real usersIn access, perfectly bypassing single IP frequency detection. The point is to pick a service provider with a large enough IP pool like ipipgo to avoid reusing the same IP.

Third, the hand to teach you to play in PHP agent

First on the core code, followed by a line-by-line analysis:


$proxy = '221.192.136.12:8080'; //proxy address from ipipgo
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "https://目标网站.com");
curl_setopt($ch, CURLOPT_PROXY, $proxy);
curl_setopt($ch, CURLOPT_TIMEOUT, 15); curl_setopt($ch, CURLOPT_TIMEOUT, 15);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$output = curl_exec($ch);
if(curl_errno($ch)){
    echo 'Error code: '.curl_errno($ch).' Recommend changing proxy IP'; }
}
curl_close($ch).

Focused Parameter Description:

  • CURLOPT_PROXY must be set to the correct format: IP:Port
  • Timeout time is recommended to be within 15 seconds, too long affects efficiency
  • Remember to deal with error codes, especially 28 (timeout) and 7 (connection refused)

Fourth, what are the real-world advantages of ipipgo?

After using 7 or 8 proxy services, I finally locked in on ipipgo for these main points:

1. Reliable survival rate - measured 95%+ IP can be connected normally
2. fast enough response - an average delay of 800ms, much better than some of the 3 seconds of motionless
3. exclusive channel - enterprise users can open a separate IP pool
4. Transparent price - not like some platforms that hide hidden charges.

A special shout-out to theirIP warm-up mechanismThe newly added IPs will first be tested for availability through low-frequency requests to avoid triggering the wind control as soon as they come up.

Fifth, the white must see to avoid the pit guide

Q: Proxy IPs are not working when I use them?
A: Normal phenomenon! It is recommended to randomly change the IP for each request, use ipipgo's API to get a dynamic IP pool, just add an array polling in the code.

Q: Setting up a proxy or being blocked?
A: Check three points: 1. request header has no browser characteristics 2. single IP request interval is too short 3. whether to trigger the man-machine authentication

Q: Do free proxies work?
A: Short-term testing is fine, but you should definitely buy a commercial service for formal projects. The availability of free proxies is usually less than 20%, and they also leak data.

VI. Configuration program for high-level players

Share a configuration template for those who have to crawl millions of data every day:


// API interface from ipipgo
$ip_api = 'https://api.ipipgo.com/get?format=json';

function getProxy(){
    global $ip_api;
    $ips = json_decode(file_get_contents($ip_api),true);
    return $ips['proxy_list'][array_rand($ips['proxy_list'])];
}

// Automatically change IPs with each request
for($i=0; $i<1000; $i++){
    $proxy = getProxy(); // Here pick up the previous curl.
    // Here we pick up where we left off with the curl code
    usleep(500000); // 0.5 second interval
}

This program realizesDynamic IP Pool + Random DelayDouble protection, with ipipgo's concurrency package, daily crawling millions of data is not a dream.

This article was originally published or organized by ipipgo.https://www.ipipgo.com/en-us/ipdaili/36502.html

business scenario

Discover more professional services solutions

💡 Click on the button for more details on specialized services

New 10W+ U.S. Dynamic IPs Year-End Sale

Professional foreign proxy ip service provider-IPIPGO

Leave a Reply

Your email address will not be published. Required fields are marked *

Contact Us

Contact Us

13260757327

Online Inquiry. QQ chat

E-mail: hai.liu@xiaoxitech.com

Working hours: Monday to Friday, 9:30-18:30, holidays off
Follow WeChat
Follow us on WeChat

Follow us on WeChat

Back to top
en_USEnglish