IPIPGO ip proxy PHP Web Crawling Tutorial: Getting Started with CURL Capture

PHP Web Crawling Tutorial: Getting Started with CURL Capture

PHP grab data always be blocked? Try this trick Recently, many brothers asked me to use PHP curl to grab data from the target site is always blocked IP, anxious to jump straight to the feet. This is something I encountered three years ago, and later found that the use of proxy IP is like wearing a vest to the program, today to the guys break the doorway. I'm not sure if I can understand it...

PHP Web Crawling Tutorial: Getting Started with CURL Capture

PHP grab data always be blocked? Try this trick

Recently, many brothers asked me to use PHP curl to capture data is always the target site blocked IP, anxious to jump straight to the feet. This is something I also encountered three years ago, and later found that the use of proxy IP is like giving the program to wear a vest, today to the guys to break the doorway.

Figuring out what's going on with proxy IPs

Proxy IP is equivalent to your network request to find a stand-in actor, as if you go to the supermarket to buy cigarettes are always recognized by the boss, change a friend to help you go to buy on it. There are three types of proxies on the market:


Transparent Proxy - the equivalent of taking a friend and announcing yourself (revealing your real IP)
Anonymous Proxy - friend goes alone but wearing your clothes (hides IP but has proxy features)
Hidden Proxy - friend is completely disguised as a passerby (recommended)

Here's the point! When choosing an agent, you have to pickipipgoThis kind of specializes in high stash agents, their home IP pool is large, each request randomly change the vest, the target site simply can not feel the law.

Hands-on teaching you curl setup proxy

Take the collection of the price of an e-commerce platform, for example, do not use the agent's code is long like this:


$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "https://目标网站.com");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$output = curl_exec($ch);

on top of thatipipgoPost-agency:


// Proxy information from the ipipgo backend
$proxy = '123.123.123.123:8888';
$auth = 'username:password';

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "https://目标网站.com");
curl_setopt($ch, CURLOPT_PROXY, $proxy);
curl_setopt($ch, CURLOPT_PROXYUSERPWD, $auth); curl_setopt($ch, CURLOPT_PROXYUSERPWD, $auth);
curl_setopt($ch, CURLOPT_TIMEOUT, 10); // set a short for timeout

Note that you have to replace username and password withipipgoThe backend gives you authentication information, and their proxy verification method is especially newbie friendly.

Guide to Avoiding the Pit: 5 Common Mistakes Newbies Make

1. Proxy IP repeatedly: the same IP continuous request is easy to be recognized, it is recommended to change the IP for each request.
2. The timeout is set too long: it is recommended to be within 10 seconds, and the next IP address will be changed if it exceeds 10 seconds.
3. Forget the exception handling: curl_exec to check whether $output is empty after
4. UA header not disguised: remember to set common browser UA with curl_setopt
5. Ignore HTTPS certificates: add this line to avoid certificate validation jams


curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);

Practical QA: You ask, I answer

Q: What can I do about slow proxy IPs?
A: Priority ElectionipipgoThe domestic BGP line, measured latency can be controlled within 200ms

Q: How do I verify if the agent is in effect?
A: Visit http://httpbin.org/ip to see if the IP returned is a proxy IP

Q: What should I do if I encounter a 403 error?
A: three steps: 1. check whether the IP is blocked 2. change User-Agent 3. reduce the collection frequency

Upgrade Play: Automatically Switching IP Pools

expense or outlayipipgoAPI to get IPs dynamically, get an IP pool management script:


// Get the IP pool
$ip_list = json_decode(file_get_contents('https://api.ipipgo.com/getips?num=20'));

// Pick a random IP
$rand_key = array_rand($ip_list);
$current_ip = $ip_list[$rand_key]['ip'].' :'.$ip_list[$rand_key]['port'];

It is recommended to change the IP every 5 times of collection, with multi-threading can improve the efficiency by 10 times. But pay attention to the target site's anti-climbing strategy, don't make people's servers hang.

Finally nagging a word, choose the proxy service don't be greedy for cheap, before using a free proxy, the result of the collection of data are all phishing sites inserted in the ads. Now useipipgoThe exclusive IP package, the stability is really top, do the project heart down to earth.

This article was originally published or organized by ipipgo.https://www.ipipgo.com/en-us/ipdaili/35284.html

business scenario

Discover more professional services solutions

💡 Click on the button for more details on specialized services

New 10W+ U.S. Dynamic IPs Year-End Sale

Professional foreign proxy ip service provider-IPIPGO

Leave a Reply

Your email address will not be published. Required fields are marked *

Contact Us

Contact Us

13260757327

Online Inquiry. QQ chat

E-mail: hai.liu@xiaoxitech.com

Working hours: Monday to Friday, 9:30-18:30, holidays off
Follow WeChat
Follow us on WeChat

Follow us on WeChat

Back to top
en_USEnglish