
Teach you to use proxy IP to break through the website access restrictions
Brothers engaged in web crawlers should have encountered this kind of shit: scripts run and run a sudden hiatus, the site either popping CAPTCHA or directly blocked IP. this time we have to move out of our life-saving weapon - theproxy IPThe first thing you need to do is to use a proxy service for PHP. Today let's take PHP and show you how to use ipipgo's proxy service to deal with these website restrictions.
Why does your crawler always get caught?
Webmasters are not vegetarians, they stare at the access logs to see, found that a certain IP crazy brush request, directly give you a seal. Ordinary users visit the web page every minute just a few times, but the crawler may be dozens of times per second, the frequency of blind people can see that there is a problem.
// Example of a typical death-crawler code
for($i=0; $i<1000; $i++){
$html = file_get_contents('target site');;
// Parsing the data...
}
It doesn't take half an hour to do this, and your IP is guaranteed to be blacklisted. It's time to use a proxy IP toSwitching identities on a rotating basis, making the site think it is being accessed by different users.
Real-world PHP proxy configuration
Here to teach you two common methods, using ipipgo's proxy service to demonstrate (their home API docking is particularly convenient).
Method 1: CURL Setting Proxy
$proxy = 'Proxy address assigned by ipipgo:port';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "destination URL");
curl_setopt($ch, CURLOPT_PROXY, $proxy);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
// It is recommended to add a timeout setting
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
$output = curl_exec($ch); curl_close($ch, CURLOPT_TIMEOUT, 10)
curl_close($ch).
Method 2: Streaming Context Setting
$context = stream_context_create([
'http' => [
'proxy' => 'tcp://'.$proxy,
'request_fulluri' => true
]
]);
$response = file_get_contents('destination url', false, $context);
How to choose a reliable proxy IP?
The agent service providers on the market are uneven, and here we must be amenable to the followingipipgo. I'll give you a list of the advantages of their home to compare:
| functionality | General Agent | ipipgo |
|---|---|---|
| connection speed | Frequent lagging | 5G leased line |
| IP library size | thousands | Million Dollar Pool |
| automatic replacement | manual operation | Automatic API switching |
| after-sales service | I can't find anyone. | 24 hours online |
A guide to avoiding lightning in common potholes
Q: What should I do if my proxy IP is not working after I use it?
A: Remember to set the failure retry mechanism, ipipgo's API supports automatic acquisition of new IPs, it is recommended that every 20 requests to change the proxy
Q: What's wrong with using a proxy and still getting blocked?
A: check the request header has no browser characteristics, do not use the obvious like crawler User-Agent, and then do not visit the frequency is too crazy, it is recommended to control within 3 times per second!
Q: What should I do if my proxy IP responds slowly?
A: In the background of ipipgo choose "high-speed channel" node, or switch to different regions of the server to try, sometimes the physical distance between the nodes faster!
Conscientious advice for newbies
Brothers who are just starting to play with crawlers are advised to start with ipipgo'sFree Trial PackagePractice. They get 1G of traffic for new users, which is enough to test basic functions. Remember a few key points:
1. Randomly draw proxies from the IP pool before each request
2. Record the number of times each IP is used
3. Immediate IP switching in case of response anomalies
4. Periodic testing of agent availability
Finally said a heartfelt words, do not believe those free agents, nine out of ten is the pit. Professional things to professional people to do, ipipgo this kind of fee service although it costs money, but can save you a lot of time to toss, the key time does not fall off the chain is really cost-effective.

