IPIPGO ip proxy Web Crawling with PHP curl: Practical Code Examples

Web Crawling with PHP curl: Practical Code Examples

First, why use proxy IP to engage in network crawling? Engaged in crawling partners must have encountered the embarrassment of the IP was blocked, especially when the target site added the anti-climbing mechanism. At this time, the proxy IP is like a stealth hang, each request for a new vest, the site simply can not tell whether you are real or program. For example, we often ...

Web Crawling with PHP curl: Practical Code Examples

First, why use a proxy IP to engage in network capture?

Crawler partners must have encountered the embarrassment of IP blocked, especially when the target site added anti-climbing mechanism. At this timeproxy IPIt is like opening a stealth hang, each request for a new vest, the site simply can not tell whether you are a real person or program. For example, we commonly used ipipgo service, can properly solve this problem, its IP pool is large enough and clean enough, not easy to be recognized.

Second, PHP curl basic operation manual

First of all, you need to understand how to use curl, which is the core tool for grabbing data. Remember these key settings:


$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "destination URL");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); //store the result don't output it directly
curl_setopt($ch, CURLOPT_HEADER, 0); //don't return header
$output = curl_exec($ch);
curl_close($ch).

watch carefullycurl_setoptThis function, quite frankly, tells curl what to do. If you don't set RETURNTRANSFER, the data will be printed directly on the page, and that's a mess.

Third, hand in hand plus proxy IP real combat

Here's the point! Put a proxy vest on curl and use ipipgo's proxy service as a chestnut:


$proxy = "123.123.123.123:8888"; //proxy IP provided by ipipgo
$auth = "username:password"; //authentication obtained in the ipipgo backend

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "https://目标网站.com");
curl_setopt($ch, CURLOPT_PROXY, $proxy);
curl_setopt($ch, CURLOPT_PROXYUSERPWD, $auth); curl_setopt($ch, CURLOPT_PROXYUSERPWD, $auth);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1).

// For debugging (remember to turn it off for formal environments)
curl_setopt($ch, CURLOPT_VERBOSE, true); curl_setopt($ch, CURLOPT_VERBOSE, true)
curl_setopt($ch, CURLOPT_STDERR, fopen('php://stderr', 'w'));

$result = curl_exec($ch);
if(curl_errno($ch)){
    echo 'Crawl error: '.curl_error($ch); }
}
curl_close($ch); }

Note the format of the proxy IPThe proxy address must be IP:port structure. ipipgo's backend can generate proxy address in this format directly, which is easy to use for thieves.

Fourth, crawl abnormal processing Daquan

Don't panic when you encounter these moths below, the old driver to teach you to see the trick:


//Check if the proxy is in effect
if(curl_getinfo($ch, CURLINFO_PRIMARY_IP)){
    echo "Currently using proxy IP: ".curl_getinfo($ch, CURLINFO_PRIMARY_IP); }
}

// Set a timeout to avoid getting stuck
curl_setopt($ch, CURLOPT_TIMEOUT, 15); //withdraw if no response for 15 seconds
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5); //Wait up to 5 seconds for connection

//Automatic retry mechanism
$retry = 3;
while($retry--){
    $result = curl_exec($ch);
    if(!curl_errno($ch)) break;
    sleep(1); // wait 1 second and try again
}

V. Frequently Asked Questions QA

Q: What should I do if I can't connect to the proxy IP all the time?
A: First, check whether there is any error in the IP port, and then use telnet to measure the connectivity. If ipipgo's IP suddenly fails, go to the background to change to a new IP, its IP pool changes quickly, basically will not be lost.

Q: How can I improve the efficiency of crawling?
A: Go to ipipgo's Dynamic Residential Proxy and engage it with multi-threading. Remember to set the random interval, don't burst like a machine gun, easy to be found.

Q: What should I do if I encounter a CAPTCHA?
A: It means that the quality of the proxy IP you are using is not good enough, change ipipgo's high stash of IPs and try it. If that doesn't work, you'll have to go to an image recognition program, but that's another story.

Proxy IP purchase doorway

You have to look at these hard indicators to pick an agency service:

  • IP survival time: ipipgo's short-lived proxies change automatically in 5-15 minutes, and the long-lived ones can last up to 24 hours.
  • Geographic location: to catch the domestic site on the local server room IP, overseas business with his family America / Asia nodes
  • Protocol support: In addition to HTTP/HTTPS, some scenarios require SOCKS5, which ipipgo supports.

One last trick:Dynamic IP Pool + Automatic SwitchingThe ipipgo background comes with API to get the latest agent in real time, with the script automatically replaced, grab the data that is called a stable. Encounter technical problems directly to his family customer service, response speed than peers faster than half a star.

This article was originally published or organized by ipipgo.https://www.ipipgo.com/en-us/ipdaili/35491.html

business scenario

Discover more professional services solutions

💡 Click on the button for more details on specialized services

New 10W+ U.S. Dynamic IPs Year-End Sale

Professional foreign proxy ip service provider-IPIPGO

Leave a Reply

Your email address will not be published. Required fields are marked *

Contact Us

Contact Us

13260757327

Online Inquiry. QQ chat

E-mail: hai.liu@xiaoxitech.com

Working hours: Monday to Friday, 9:30-18:30, holidays off
Follow WeChat
Follow us on WeChat

Follow us on WeChat

Back to top
en_USEnglish