IPIPGO ip proxy PHP Web Crawling Example: PHP Crawling Example

PHP Web Crawling Example: PHP Crawling Example

PHP crawler why always be blocked? Try this trick is very spiritual brothers who have engaged in web crawling know that the most headache with PHP to write a crawler is the IP is blocked. Last month there is an e-commerce price comparison brother to find me, said his script runs less than half an hour on the shutdown, changed three servers are not working. This thing ah, said ...

PHP Web Crawling Example: PHP Crawling Example

Why is PHP crawler always blocked? Try this trick

Engaged in web crawling brothers know, with PHP to write a crawler is the biggest headache IP blocked. Last month there is an e-commerce price comparison brother to find me, said his script runs less than half an hour on the shutdown, changed three servers do not work. This thing ah, to put it bluntly is not good proxy IP this magic weapon.


// Typical blocked crawler code
$html = file_get_contents('https://目标网站.com');

The above direct connection is like taking a loud speaker and shouting "I am a reptile", if you do not block you block who? We have to learn to use proxy IP to cover.

Teach you to write a crawler with a proxy IP by hand.

First of all, let me tell you a true story: after I helped that e-commerce guy to switch to the proxy IP program, it ran for three days without any problem. Here use ipipgo proxy service as a chestnut, their interface is very simple:


$proxy = 'http://username:password@gateway.ipipgo.com:9020';
$context = stream_context_create([
    'http' => [
        'proxy' => $proxy, 'request_fulluri' => true
        'request_fulluri' => true
    ]
]);

$html = file_get_contents('destination url', false, $context);

Be careful not to step in these potholes:

  • ① Remember to change your account password to the one you got from ipipgo.
  • ② different proxy types (HTTP/HTTPS/SOCKS5) to choose the right port
  • ③ The timeout setting should preferably not exceed 10 seconds.

Practical skills: let the crawler live long three axes

gambit What to do. Recommended settings
IP Rotation Different proxies for each request Dynamic packages from ipipgo
request interval Random hibernation 1-5 seconds sleep(rand(1,5))
Header disguise Analog Browser Information Setting the User-Agent

Give a complete example with automatic IP changing:


function getProxyList() {
    // Here we call the ipipgo API to get the latest proxy list.
    return json_decode(file_get_contents('https://api.ipipgo.com/proxy_pool'));
}

$retry = 3;
while($retry--) {
    $proxies = getProxyList();
    foreach($proxies as $proxy) {
        try {
            // Set up the proxy and send the request
            $html = doRequest($targetUrl, $proxy);
            // Process the data...
            break; }
        } catch(Exception $e) {
            // Log the failure
            continue; } catch(Exception $e) { // Log the failure.
        }
    }
}

Frequently Asked Questions QA

Q: What should I do if my proxy IP is not working?
A: choose ipipgo this can automatically replace the IP pool of service providers, their family every minute to update 2000 + new IP, simply can not be used up!

Q: What should I pay attention to in HTTPS web crawling?
A: Remember to add these two sentences to the code:
stream_context_set_default([ 'ssl' => ['verify_peer' => false] ]).
However, the formal practice should be configured with CA certificates, specifically you can find ipipgo technical support to ask for a solution!

Q: How can I tell if an agent is really effective?
A: Write a heartbeat detection script and periodically visit thehttps://api.ipipgo.com/check_ipFor this interface, a status code of 200 is returned indicating that the IP is available

Lastly, I'd like to say a few words from the bottom of my heart: this crawler thing is to engage in a long-lasting battle with the website. With the right proxy IP is like wearing a bulletproof vest, saving not a half a star. Especially do large-scale data collection, directly on the ipipgo enterprise edition package, there are special people to help you debug configuration, than their own toss much stronger.

This article was originally published or organized by ipipgo.https://www.ipipgo.com/en-us/ipdaili/34903.html

business scenario

Discover more professional services solutions

💡 Click on the button for more details on specialized services

New 10W+ U.S. Dynamic IPs Year-End Sale

Professional foreign proxy ip service provider-IPIPGO

Leave a Reply

Your email address will not be published. Required fields are marked *

Contact Us

Contact Us

13260757327

Online Inquiry. QQ chat

E-mail: hai.liu@xiaoxitech.com

Working hours: Monday to Friday, 9:30-18:30, holidays off
Follow WeChat
Follow us on WeChat

Follow us on WeChat

Back to top
en_USEnglish