IPIPGO ip proxy PHP web crawler: PHP website data crawling tutorials

PHP web crawler: PHP website data crawling tutorials

PHP crawler for what always be sealed? Try this trick spirit of a lot of brothers recently asked, written in PHP crawler is always the target site blocked IP, angry want to smash the keyboard. This matter is frankly your network fingerprints are too obvious, today teach you a trick - with a proxy IP to play cover. Like playing hide-and-seek constantly change the ma...

PHP web crawler: PHP website data crawling tutorials

Why is PHP crawler always blocked? Try this trick

Recently, many brothers asked, written in PHP crawler is always the target site blocked IP, angry want to smash the keyboard. This matter is frankly too obvious that your network fingerprints, today teach you a trick - with a proxy IP to play cover. Like playing hide-and-seek constantly changing vests, so that the site can not catch your real body.

There's a lot to be said for picking a guy. Don't mess with the tools.

The newbie favorite is file_get_contents, but that's no different than running around naked:


$html = file_get_contents("http://目标网站");

Veterans are using CURL suits as if they were wearing body armor:


$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://目标网站");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$result = curl_exec($ch);

Proxy IPs are what keep you alive.

Add these lines to the curl configuration and it instantly changes:


curl_setopt($ch, CURLOPT_PROXY, 'Proxy IP:Port');
// If using dynamic tunneling with ipipgo
curl_setopt($ch, CURLOPT_PROXY, 'http://用户名:密码@gateway.ipipgo.com:端口'); 

take note ofChanging IPs for every request, ipipgo's API gets the latest IP in real time, like this:


$ip_list = json_decode(file_get_contents('https://api.ipipgo.com/get?num=5'));
$random_ip = $ip_list[rand(0,4)];

Practical case: grab a limited number of goods

Last year, I helped my friend to write a script to grab shoes, and I was cool in 5 minutes without using a proxy. Later, I used ipipgo's exclusive IP pool, and the secret of success is here:


function stealth_request($url){
    $ch = curl_init();
    // Get the day's valid IPs from ipipgo
    $proxy = get_ipipgo_proxy();
    curl_setopt($ch, CURLOPT_PROXY, $proxy);
    curl_setopt($ch, CURLOPT_TIMEOUT, 10); // set short for timeout
    curl_setopt($ch, CURLOPT_HTTPHEADER, [
        'User-Agent: Mozilla/5.0 (Windows NT 10.0; rv:91.0) Gecko/20100101 Firefox/91.0'
    ]);
    return curl_exec($ch);
}

Guide to avoiding pitfalls (collect for backup)

symptomatic antidote
Suddenly return to blank Immediate switching of ipipgo's next IP node
CAPTCHA appears Reduce Request Frequency + Change User-Agent
Connection timeout Check if the proxy port is filled in incorrectly

A must-see for beginners QA

Q: Can't I use the free agent?
A: The market free agent 10 have 9 is the pit, either slow or early failure. ipipgo commercial level agent with dedicated maintenance, measured success rate of 98% or more.

Q: How do I know the agent is in effect?
A: Put a check in the code:


curl_setopt($ch, CURLOPT_PROXY, $proxy);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
if(curl_exec($ch) === false) {
    echo "Proxy $proxy is hanging, move to the next one!" ;
}

Q: How to solve the problem when encountering the website backcrawl?
A: Three tricks: ① use ipipgo's residential proxy ② randomly hibernate for 0.5-3 seconds ③ mix mobile/PC UA header

Upgrade Play: Distributed Crawler

For large projects remember to use multithreading + agent pools and configure it that way:


// Get 200 IPs from ipipgo for Redis.
$ip_pool = get_ipipgo_batch(200);

// Fetch different IPs for each thread
$worker->setProxy(array_pop($ip_pool)); 

Note that IP availability should be monitored and IP replacement is automatically triggered when it falls below 90%.

Finally, to be honest, the proxy IP thing a penny a penny. Since the use of ipipgo, no longer need to get up in the middle of the night to change the IP, the system automatically maintains the pool, saving time enough to sleep a peaceful sleep. Some brothers said expensive, but compared to the losses caused by the blocked number, this investment is really nothing.

我们的产品仅支持在境外网络环境下使用(除TikTok专线外),用户使用IPIPGO从事的任何行为均不代表IPIPGO的意志和观点,IPIPGO不承担任何法律责任。

business scenario

Discover more professional services solutions

💡 Click on the button for more details on specialized services

IPIPGO-五一狂欢 IP资源全场特价!

Professional foreign proxy ip service provider-IPIPGO

Contact Us

Contact Us

13260757327

Online Inquiry. QQ chat

E-mail: hai.liu@xiaoxitech.com

Working hours: Monday to Friday, 9:30-18:30, holidays off
Follow WeChat
Follow us on WeChat

Follow us on WeChat

Back to top
en_USEnglish