IPIPGO ip proxy PHPcurl Crawl: Web Page Capture Example

PHPcurl Crawl: Web Page Capture Example

Teach you to use PHPcurl + proxy IP to collect data We do data collection, the most afraid of encountering the site anti-climbing mechanism. Last week, a friend who does e-commerce came to me and said that the collection script he wrote with PHPcurl suddenly failed, and the website directly blocked his IP for three days. It's not difficult to solve this problem, today I'll...

PHPcurl Crawl: Web Page Capture Example

Teach you by hand to use PHPcurl + proxy IP to collect data

We do data collection, the most afraid to meet the website anti-climbing mechanism. Last week an e-commerce friend to find me, said he used PHPcurl to write the collection script suddenly failed, the site directly to his IP blocked for three days. It is not difficult to solve this problem, today I will take this case, teach you how to use ipipgo proxy IP service to deal with anti-climbing.


// Basic curl example (this will be blocked sooner or later)
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://目标网站.com");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$output = curl_exec($ch);

Why do I have to use a proxy IP?

A lot of sites are loadedFlow Fingerprint Identification SystemIt's like the security door of a supermarket. You use an IP to access repeatedly, equivalent to the same person half an hour in and out of the supermarket 20 times, the security guards do not stare at you to stare at who? ipipgo's proxy pool have8 million + dynamic IPsThis is equivalent to preparing numerous "vests" for you, so that the site can not distinguish who is who.

take No need for an agent. Proxy with ipipgo
Number of requests per day ≤500 times ≥ 50,000 times
probability of IP blocking 80% and above <3%

Real-world makeover: putting IP armor on curl

Take the script that was just blocked and remodel it in three key steps:


// Get the proxy from ipipgo (be careful to replace your own API key)
$proxy = file_get_contents("https://api.ipipgo.com/getproxy?key=你的密钥");

// Configure the curl proxy parameters
curl_setopt($ch, CURLOPT_PROXY, $proxy);
curl_setopt($ch, CURLOPT_PROXYTYPE, CURLPROXY_HTTP); curl_setopt($ch, CURLOPT_PROXYTYPE, CURLPROXY_HTTP)
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);

// Important! Remember to add an error retry
if(curl_errno($ch)){
    $proxy = file_get_contents("https://api.ipipgo.com/report?proxy=".$proxy); // Report the failed IP.
    // Retrieve proxy to continue execution...
}

Watch out for potholes:Don't try to save trouble by writing the proxy IP to death in the code, make sure to use dynamic acquisition. ipipgo's API supports filtering IPs by region and carrier, you can use this function if you do cross-border collection.

Tips for improving collection efficiency

1. Multi-threaded acquisition, each thread should be equipped with an independent agent, do not let multiple requests share the same IP address.
2. Randomize request intervals, don't visit like an alarm clock!
3. Don't fight when you encounter CAPTCHA, change to a new IP through ipipgo and try again.
4. Regularly clear cookies, do not let the site track the behavior of the track


// Random delay script (in seconds)
sleep(rand(1,5) + mt_rand(0,3000)/1000);

Frequently Asked Questions QA

Q: What should I do if my proxy IP suddenly fails?
A: add a reporting mechanism in the curl error callback, ipipgo's system will automatically exclude the problem IP when it receives the feedback

Q: How can I tell if a proxy is in effect?
A: Print curl_getinfo($ch, CURLINFO_PRIMARY_IP) after curl_exec to see if the output IP has changed

Q: How many proxy IPs are needed per day?
A: According to the business volume, generally 200-300 requests per IP per hour is safer. ipipgo's packages range from daily rentals to monthly packages, and new users get 5000 test IPs!

Lastly, I'd like to remind you that you should follow the website robots protocol when doing data collection. Using ipipgo's proxy service is not to do damage, but to make our legal collection smoother. Once I helped a customer to do price comparison system, after using the dynamic agent, the success rate of data acquisition from 47% directly soared to 98%, the effect is immediately visible.

This article was originally published or organized by ipipgo.https://www.ipipgo.com/en-us/ipdaili/36386.html

business scenario

Discover more professional services solutions

💡 Click on the button for more details on specialized services

New 10W+ U.S. Dynamic IPs Year-End Sale

Professional foreign proxy ip service provider-IPIPGO

Leave a Reply

Your email address will not be published. Required fields are marked *

Contact Us

Contact Us

13260757327

Online Inquiry. QQ chat

E-mail: hai.liu@xiaoxitech.com

Working hours: Monday to Friday, 9:30-18:30, holidays off
Follow WeChat
Follow us on WeChat

Follow us on WeChat

Back to top
en_USEnglish