
Hands-on teaching you to use PHP to engage in web crawling without blocking number
Crawlers should have encountered this situation: just grabbed a few pages of data IP was blocked, especially engaged in e-commerce price monitoring or public opinion analysis, often by the target site black. At this time we have to rely on proxy IP to continue, today we take PHP to say how to play around with proxy IP to catch data.
Choosing the right proxy IP service provider is the first step to success
There are a lot of proxy IP service providers on the market, but there are really not many reliable ones. Here we must favoripipgoThe dynamic residential agent of the family, personally tested and effective. Their home IP pool is updated 2 million + every day, supports automatic switching, and the most critical is that there are optimized lines specifically for e-commerce platforms.
// Example of getting the ipipgo proxy
$api_url = "https://api.ipipgo.com/getproxy?format=json&key=你的API密钥";
$proxy_data = json_decode(file_get_contents($api_url), true);
// Getting the proxy information looks like this
/
{
"port": 8888, "expire_time": "2024-08-01 12:00
"expire_time": "2024-08-01 12:00:00"
}
/
PHP crawl live code (with exception handling)
The following code is battle-tested, focusing on the proxy settings and exception handling sections:
function fetchWithProxy($url) {
$ch = curl_init();
// Get the latest proxy from ipipgo_proxy
$proxy = get_ipipgo_proxy(); // Wrap this function yourself!
curl_setopt($ch, CURLOPT_PROXY, $proxy['ip']);
curl_setopt($ch, CURLOPT_PROXYPORT, $proxy['port']); curl_setopt($ch, CURLOPT_PROXYPORT, $proxy['port'])
curl_setopt($ch, CURLOPT_TIMEOUT, 15); // set a short for timeout
curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_URL, $url); // set short for timeout.
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_URL, $url); // Set short point for timeout.
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); // skip certificate verification
// Disguise the browser
curl_setopt($ch, CURLOPT_HTTPHEADER, [
'User-Agent: Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36'
]).
try {
$output = curl_exec($ch); if(curl_errno($ch))
if(curl_errno($ch)){
throw new Exception('Crawl failed: '.curl_error($ch)); }
}
return $output; }
} finally {
curl_close($ch); }
}
}
// Example call
$html = fetchWithProxy("https://target-site.com/product/123");
Six Tips for Dodging Anti-Crawlers
It's not enough to have an agent, these details are still blocked if you don't pay attention to them:
| anti-climbing measures | hacking method |
|---|---|
| Request Frequency Detection | Random delay 0.5-3 seconds, don't use fixed intervals |
| Browser Fingerprinting | Changing User-Agents and Cookies Every Time |
| CAPTCHA interception | Real life residential agent with ipipgo |
| IP Behavior Analysis | No more than 30 minutes of use for a single IP |
Frequently Asked Questions
Q: Why was my proxy blocked just after I used it?
A: may have used the data center IP, change ipipgo's residential agent to try, simulate the real user environment
Q: What about crawling pages that require login?
A:First use the fixed IP to complete the login to obtain cookies, and then use the proxy pool to perform specific operations
Q: How do ipipgo's agents charge?
A: Flexible billing by traffic and IP number, new users get 5GB of experience traffic, enough for testing for a month!
Upgrade Play: Distributed Crawl Architecture
For large projects, Redis + multi-process architecture is recommended:
// Pseudo-code example
$redis = new Redis();
while($proxy = $redis->lpop('ipipgo_proxies')) {
$pid = pcntl_fork();
if ($pid == -1) {
die('Failed to create sub-process');
} elseif ($pid) {
// Parent process continues to be created
} else {
// The child process performs the fetch
fetch_data($proxy);
exit();
}
}
Finally, we remind you to use proxy IPs to comply with the robots.txt rules of the website, so as not to make the server hang. You can contact ipipgo's technical support directly if you encounter any problems, they are very experienced in dealing with anti-climbing problems.

