
Hands-on UA header blocking prevention with curl
Engaged in data collection of old iron understand, the target site anti-climbing mechanism with hanging like more and more fierce. Relying on IP change is simply not enough to see, today to teach you a tart operation ---UA header camouflage + proxy IP double sword combination, especially with the ipipgo family of high stash proxies, will definitely allow your crawlers to slip under the noses of your target sites.
Why are UA heads so important?
For example, the probability of being caught in a school uniform is much higher than in civilian clothes. The site is through the UA header this "school uniform" to identify the crawler. Commonly used:
This default UA header is exposed in minutes
curl http://example.com
The latest data from an e-commerce site shows that requests with the default curlUA header that78.61 TP3T will be intercepted directlyI'm not sure if I'm a good person, but I'm not a good person. This is not nonsense, last week there is a buddy who does price comparison software, after changing ipipgo's proxy + UA camouflage, the request success rate directly from 19% to 93% soared.
Curl Setting UA Headers Practical Manual
Here's the point! Remember this universal template:
curl -H "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36 Edg/91.0 .864.59"
--proxy http://username:password@gateway.ipipgo.com:9021
http://target-site.com
Note three key points:
- UA head to chooseTop 5 in terms of market sharebrowser versions (don't mess with the old ones)
- Proxies must use theHighly anonymous type(ipipgo's tunneling agent comes with this attribute)
- Remember to change the browser minor version number in the UA string periodically
Proxy IP Selection Guide to Avoid Pitfalls
| Agent Type | anonymity | Applicable Scenarios |
|---|---|---|
| Transparent Agent | streak (runners) | It's basically useless. |
| General anonymous | Exposes the use of proxies | general browsing |
| High Stash Agent (recommend ipipgo) | Totally invisible. | Crawler/Data Collection |
Focusing on ipipgo'sDynamic Tunneling AgentThe family will automatically rotate the export IP, with UA camouflage is simply a golden partner. The actual test with his agent + this paper's UA setup program, 500 consecutive requests have not triggered the wind control.
Frequently Asked Questions First Aid Kit
Q: Is the UA header set correctly or is it still recognized?
A: check three points: 1. whether with cookies 2. whether the frequency of requests is too high 3. whether the proxy IP is labeled (it is recommended to use ipipgo's exclusive IP pool)
Q: How to catch the data on the cell phone?
A: Change the UA header to a mobile style, for example:
curl -H "User-Agent: Mozilla/5.0 (iPhone; CPU iPhone OS 14_7_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E148" ...
Q: How do I access the ipipgo proxy?
A: Create a proxy tunnel in their backend, and you will get the exclusive connection address, in the usual format:
http://[username]:[password]@gateway.ipipgo.com:[port]
The Ultimate Defense Solution
For complete invisibility, remember this formula:
Dynamic UA header + ipipgo high stash proxy + random request interval
Specific implementation can write a UA pool random rotation, here is a bash script to give an example:
! /bin/bash
UA_LIST=(
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36..."
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15..."
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36..."
)
while true; do
RANDOM_UA=${UA_LIST[$RANDOM % ${UA_LIST[@]}]}
curl -H "User-Agent: $RANDOM_UA"
--proxy http://ipipgo_proxy_credentials@gateway.ipipgo.com:9021
-L "http://target-site.com"
sleep $((RANDOM % 5 + 2))
done
This program can be tested to bypass the regular 99% anti-climbing, with ipipgo's million IP pool, do large-scale data collection is not afraid. Recently, there is an e-commerce price monitoring team, the daily request volume of this program over a million times, stable run for three months.

