
I. Why toss the curl request header?
Many partners in the use of curl to do data capture, often encountered the site returns 403 error. This thing is just as frustrating as going to the supermarket to buy something was stopped at the door--The server doesn't think you're real.. Websites nowadays are equipped with smart gating systems that check if your request header is what a browser would normally access.
For example, if you access a website with the bare-bones curl command, the default User-Agent looks like this:
curl/7.68.0
It's telling the server you're a robot! We have to give it a chance.put on makeup, masquerading as a Chrome or Firefox proper browser.
Second, hands-on teaching you cosmetic curl request header
Let's start with a few commonly used request header parameters, write them down in a little notebook:
| request header | Example of a decent browser |
|---|---|
| User-Agent | Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36... |
| Accept | text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8 |
| Accept-Language | zh-CN,zh;q=0.9,en;q=0.8 |
The command looks like this (Focus on the -H parameter):
curl -H "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64)..."
-H "Accept-Language: zh-CN,zh;q=0.9"
https://目标网站.com
Third, with ipipgo agent better results
Just changing the request header is sometimes not enough, some sites hold grudges -The same IP access too many times still blockedThe best way to do this is to use the ip ipgo proxy service. This is the time to pull out our killer ipipgo proxy service.
Add a -proxy parameter to the command and you're done:
curl --proxy http://username:password@gateway.ipipgo.com:9020
-H "User-Agent: proper browser UA"
https://目标网站.com
ipipgo's.Dynamic Residential AgentsEspecially good, each request automatically change IP, play hide and seek with the site. They also have a special anti-climbing mechanism optimization channel, tested a certain East and a certain treasure will not trigger the verification.
Four, common rollover scene rescue guide
QA 1: Why is it still returning 403 even though the request header is set?
→ Check that Accept-Encoding is not missing, some sites will check this parameter. Try adding -H "Accept-Encoding: gzip, deflate, br".
QA 2: What if the agent can't connect?
→ First, use curl -proxy to access ipipgo's IP detection interface to see if the exit IP is correct. If it times out, it may be fire-blocked, try another port.
QA 3: What do I have to do to stay logged in?
→ Remember to take the Cookie header with you as well, with -H "Cookie: your login credentials". It is recommended to copy the cookie out with the developer tool after logging in to the browser first
V. Essential skills for senior players
When you come across a particularly difficult site, you can offer up a big hit - theRandomization request headerThe following is an example of a shell script that randomly combines different browser UA and language parameters. Write a shell script to randomly combine the UA and language parameters of different browsers each time, with ipipgo's auto-switching IP function, to perfectly realize stealth access.
Here's an example of a simple version of the script:
! /bin/bash
UA_LIST=("Mozilla/5.0 (Windows)...") "Mozilla/5.0 (Macintosh)...")
RANDOM_UA=${UA_LIST[$RANDOM % ${UA_LIST[@]}]}
curl --proxy http://ipipgo代理地址
-H "User-Agent: $RANDOM_UA"
-H "Accept-Language: zh-CN,en;q=0.$(($RANDOM%3+5))"
https://目标网站.com
As a final note, remember to follow the site's terms of service when using a proxy. ipipgo is compliant with all of their nodes!Clean IP Pool, it's steady as an old dog to use, and new users get a test dosage, so we recommend trying before you buy.

