
Hands-on teaching you to use curl plus Header anti-blocking crawl data
Recently, some old iron asked me, with curl crawl data old by the site blocked IP how to do? Today we will nag about this. Focus on a tough trick--Customized Header + Proxy IPCombinations that are pro-tested to work.
First of all, a real case: an e-commerce platform price monitoring script, with ordinary curl request less than half an hour to be ban. later to the request header with the browser characteristics, together with ipipgo's dynamic proxy pool, running for three days are fine. Here's how to do it.
The correct posture of curl plus Header
Let's start with a typical rollover scene:
curl https://目标网站.com
With this kind of bare-bones request, the server knows at a glance that it's a bot doing something. We have toPut a vest on curl.::
curl -H "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36" -H "Accept-Language: zh-CN,zh;q=0.9" -H "Referer: https://www.google.com/" https://目标网站.com
Note the three key Headers:
| Header name | corresponds English -ity, -ism, -ization | example value |
|---|---|---|
| User-Agent | Fake Browser | Latest version of Chrome or Firefox |
| Accept-Language | Language Settings | zh-CN first |
| Referer | source page | Simulate Search Engine Jump |
The right way to open a proxy IP
It's not enough to just change the header, you have to work with a proxy IP in order tocomplete invisibility. Here we recommend using ipipgo's service, who has a special anti-blocking package. See specific usage:
curl -x http://用户名:密码@proxy.ipipgo.com:端口号 -H "User-Agent: Mozilla/5.0..." https://目标网站.com
Watch out for these two potholes:
- Don't use free proxies, 99% are all public IP pools, long ago the site pulled black
- Residential proxies are more insidious than server room proxies, ipipgo'sDynamic Residential IPHigher success rate for packages
A practical guide to avoiding the pit
The strangest ban I've ever encountered: a site that actually detects font rendering parameters in cookies! Here's a couple of tawdry maneuvers to share:
- Regularly replacing headers in theAccept-Encoding(be) worth
- Randomly insert meaningless but legal fields into the request header, such asX-Requested-With: XMLHttpRequest
- With ipipgo.session holdFunctions to maintain a reasonable access frequency for the same IP
Frequently Asked Questions QA
Q: What should I do if I still get blocked after adding Header?
A: Check if the Cache-Control field is missing, it is recommended to add theCache-Control: max-age=0Simulating Browser Behavior
Q: How to solve the problem of slow proxy IP speed?
A: ipipgo'sIntelligent Routingfunction automatically selects the fastest node, or you can add the-m 30Setting the timeout period
Q: What if I need to deal with cookies?
A: First use curl's-c cookie.txtparameter to save the cookie, subsequent requests bring-b cookie.txt
The Ultimate Life Preservation Program
Finally a universal template, remember to replace it with your ipipgo account:
curl -x http://vipuser:123456@proxy.ipipgo.com:8899 -H "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36" -H "Accept: text/html,application/xhtml+xml" -H "Accept-Encoding: gzip, deflate, br" --compressed https://目标网站.com
This template has three key designs:
- Using ipipgo'sEnterprise Agent Channel
- Emulate full browser features
- Enable compressed transmission to save traffic
If you encounter a particularly difficult website, you can contact ipipgo technical support to customize it!Dedicated anti-climbing program, their engineers have dealt with all sorts of sick anti-climbing tactics, like what TLS fingerprinting authentication, browser fingerprinting detection can handle.

