
Reptile old driver overturned the actual record
Last week an e-commerce friend came to me crying, their team spent three months to develop the crawler suddenly collective strike. After half a day's investigation, I found that the problem lies in the User-Agent (UA) being recognized by the website. This is like using the same face a dozen times a day in and out of the neighborhood, the guards do not stop you to stop who?
Now the anti-climbing mechanism is getting more and more refined, just changing IP is not enough. One time I tested with 200 proxy IPs that I raised myself, and the result wasRequest from 62%Planted on UA detection. It was later discovered that websites would catch anomalies by details like browser version and device model in the UA.
Invisible Cloak Wearing Guide
A true disguise has to be toldboth inside and outside the box::
| camouflage site | common pitfall | prescription |
|---|---|---|
| IP address | High-frequency repeat visits | Dynamic Proxy IP Pool |
| UA logo | Very Useful Browser Versions | Real-time updating of the UA library |
| Behavioral characteristics | Fixed visit intervals | 随机操作 |
Recommended here is to use ipipgo's Dynamic Residential Proxy, their IP pool is automatically updated daily with 15% address segments. I usually like to use the UA pool in conjunction with the proxy IPs, pairing values like this:
import random
from ipipgo import ProxyPool
ua_list = [
"Mozilla/5.0 (Windows NT 10.0; Win64) AppleWebKit/537.36..." ,
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.11..." ,
It is recommended to keep 300+ real UA's
]
proxy = ProxyPool.get_proxy() Automatically get the latest proxy
headers = {
'User-Agent': random.choice(ua_list),
'Accept-Language': 'en-US,en;q=0.9'
}
Remember to add the random delay
time.sleep(random.uniform(1.2, 3.8))
A three-piece guide to avoiding the pit
1. UA preservation tips:Don't use those crappy UA libraries, it's recommended to pick up real user data from traffic analysis tools yourself. I often use Wireshark to grab packets and save the popular UA of the last 3 days into a csv file
2. Fingerprint obfuscation:Some sites now detect canvas fingerprints. It's not agent related, but it's recommended to add this to the crawler:
const canvas = document.createElement('canvas');
ctx = canvas.getContext('2d');
ctx.fillStyle = 'rgb(' + Math.floor(Math.random() 256) + ', ...' ;
// Randomly generate canvas features
3. Agent quality testing:Run a weekly full check using the connectivity test interface provided by ipipgo. Their API returns pretty fast and you can see which IP segments are flagged in real time:
curl -X GET "https://api.ipipgo.com/proxy/check?key=your_key"
QA First Aid Kit
Q: Do free proxies work?
A: Last year, I tried an open source proxy pool, 10 requests can have 3 success is considered lucky. Then change ipipgo commercial agent, the success rate directly soared to 92%, really a penny a penny.
Q: How often does UA have to be updated?
A: Look at the strength of the target site's anti-crawl. Ordinary site monthly update is enough to fight against the big factory level of anti-crawl, it is recommended to follow the Chrome official version of the update rhythm.
Q: How do I choose an agent package?
A: Look at the business scenario first. Like ipipgo'sE-commerce Special EditionSpecifically optimized access strategy for shopping sites, with a success rate 18 percentage points higher than the generic version.
The Ultimate Defense Solution
Recently, I was helping an MCN organization with data collection, and their situation was particularly typical:
1. Need to capture 7 e-commerce platforms simultaneously
2. 2 million requests per day
3. Involves a mix of image and API capture
The final program isipipgo Dynamic Residential Agent + Customized UA Rotation System, in conjunction with the request frequency control algorithm. The stability rate is kept above 89% in three months of operation, saving 37% cost than their previous self-built solution.
Lastly, don't take UA spoofing as a one-time project. It's a long-lasting battle, just like proxy IP maintenance. Last week, I just found that a platform has added WebGL fingerprint detection, and there will always be new weapons on the anti-climbing battlefield.

