
Playing with request header camouflage: making crawler requests more like real people's actions
Older drivers who work with web requests know that many websites identify machine behavior by request header characteristics. Just like a supermarket security guard will watch out for people who always take the same item, the server will also keep an eye on requests that are configured with the default curl. This timeRequest header masquerading + proxy IPThe combination of the combination is particularly important, our own ipipgo's proxy service just to help the guys to solve this problem.
Requesting a head camouflage triple axe
The first move isTaking out characteristic parametersIt's a good idea to use curl to send requests with User-Agent parameters by default. When you send a request with curl, you will bring User-Agent by default, which is like wearing a uniform to go shopping, clearly telling people that you are here to work.
curl -H "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
-H "Accept-Language: zh-CN,zh;q=0.9"
-H "Referer: https://www.example.com/"
--proxy http://user:pass@proxy.ipipgo.cn:8080
https://target-site.com
The second move israndom parameter arrangementDon't write in a fixed order like Accept, Connection, Host. Don't write in a fixed order like Accept, Connection, Host, just like when playing cards, don't always play in the order of the king and queen. Tested partners know that the survival rate of disorganized request header can improve 30% or more.
Proxy IP's Gold Partner
Changing the request header alone is not enough, you have to work with ipipgo's dynamic proxy. TheirResidential IP poolThere are these advantages:
- Real User Behavior Trajectory Simulation
- Automatic change of exit IP every 5 minutes
- Support socks5/http dual protocol switching
Remember to add a timeout parameter when using their proxy to avoid getting stuck:
curl --proxy http://dynamic.ipipgo.cn:3128
--proxy-connect-timeout 15
---max-time 30
-H "Cache-Control: max-age=0"
https://target-site.com
A practical guide to avoiding the pit
Newbies often fall into these potholes:
1. SSL Fingerprint Leak: Some websites detect TLS handshake features, it is recommended to add the following to the curl command--tlsv1.2Specified version
2. time zone exposure: Remember to addX-Timezone: Asia/ShanghaiThis common head
3. Device Resolution: Mobile requests should be made withDevice-Resolution: 1080x1920this kind of parameter
Frequently Asked Questions QA
Q: Does the request header order really affect recognition?
A: Like an e-commerce platform will monitor the location of Accept-Encoding and Accept-Language, our test group ran tens of thousands of requests with the ipipgo proxy, and the interception rate of the disordered configuration was lower than the standard configuration by 47%
Q: How does a dynamic agent maintain a session?
A: ipipgo'sSession-holding agentsSupport 30 minutes fixed IP, add in curl--proxy-keepaliveparameter will work.
Q: How do I detect the effect of camouflage?
A: Recommended to use ipipgo's official websiteCamouflage Detection ToolIf you enter your curl command, you can see the parameter scores.
Upgrade Play Recommendations
Old hands use ipipgo.Intelligent Routing AgentThe best exit node can be automatically matched according to the target website. For example, if you want to climb the picture station will cut to the mobile network IP, and if you want to engage in data interface, you can take the data center line, which is especially easy to configure in curl:
curl --proxy http://smartroute.ipipgo.cn:8888
-H "X-Proxy-Mode: image_crawler"
https://image-site.com
Final rant, request header masquerading is not metaphysical, the key is more testing and tweaking. The great thing about using ipipgo proxies is that they haveReal-time interception data monitoringIf you can't get a warning, it's better to get a warning right away, than to go blind on your own.

