
Hands-on teaching you to use PHP to grab data
What do you fear most in data collection? Of course, it is IP blocking! I've seen so many things like this when the hard-written scripts are blacked out by the target website after a couple of runs. Today, I will teach you to use native CURL with ipipgo proxy IP, to get a stable as the old dog collection program.
Basic CURL configuration to understand
First of all, the whole understand PHP's CURL base settings, this code is the root of the collection:
$ch = curl_init(); curl_setopt($ch, CURLOPT_URL, "Target URL"); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_HEADER, 0); $output = curl_exec($ch);
focus on: Remember to add the timeout setting! It is recommended to set CURLOPT_TIMEOUT to 20 seconds and CURLOPT_CONNECTTIMEOUT to 15 seconds, so that you don't let the script get stuck.
The right way to open a proxy IP
Go straight to ipipgo's proxy configuration code, that's what saves your life:
curl_setopt($ch, CURLOPT_PROXY, 'Proxy IP:port'); curl_setopt($ch, CURLOPT_PROXYUSERPWD, 'account:password');
When using ipipgo's rotating proxy pool, it's recommended to get a new IP for each request. their API to get it is simple for thieves:
$ip = file_get_contents('https://api.ipipgo.com/getproxy');
Practical anti-blocking techniques open to the public
| manipulate | normal mode | agency model |
|---|---|---|
| daily collection | 500 articles | 500,000+ |
| Shelf life | 2 hours. | long term stability |
| probability of being blocked | 90% | <5% |
Focused TipsRemember to add a random User-Agent in the header, ipipgo's proxy IP pool comes with this feature, save a lot of heart.
Don't be sloppy with exception handling
Capturing scripts without exception handling is like driving a car without a seatbelt. A must-add triple insurance policy:
- curl_errno() checks for network errors
- http_code determines the response status
- Setting up the automatic retry mechanism
if(curl_errno($ch)){
file_put_contents('error.log', date('Y-m-d H:i:s').'' Error:'.curl_error($ch)."" , FILE_APPEND);
}
QA Frequently Asked Questions
Q: What should I do if my proxy IP suddenly fails?
A: With ipipgo's smart switching feature, their API returns verified and available IPs
Q: What should I do if the collection speed is slow?
A: Try their exclusive high-speed proxy line, remember to adjust the concurrency parameter of CURL!
Q: What should I do if I need to collect overseas websites?
A: ipipgo has static residential IPs in 200+ countries around the world, just choose the corresponding regional node.
Upgraded Capture Program
To engage in large-scale collection of friends a trick: use ipipgo's API + Redis to engage in IP pool management, the code structure is about this:
$redis = new Redis();
$ipList = $redis->lRange('proxy_pool',0,-1);
foreach($ipList as $proxy){
// Here we put the collection logic
// Failure to collect automatically exclude the current IP
}
Remember to set up a timed task to automatically replenish fresh IPs via ipipgo's API in the early hours of each day to ensure that there are 50+ available proxies in the pool at all times.
Lastly, I would like to say a few words from my heart, don't try to be cheap when choosing a proxy service. Before using a few cheap, 10 IP can have 8 failure. Later change ipipgo's platinum package, expensive is expensive, but wins in the stability, business volume directly over 3 times. Their intelligent routing function is really good, automatically matching the fastest line, saving a lot of debugging time.

