Why is curl always stuck in redirection when using proxy IP?
engaged in crawling friends must have encountered this situation: requesting the site with curl, obviously the page should automatically jump dead not return data. Especially after using a proxy IP.Redirection failure rate directly doubles. Here's a cold one - sites over 60% will set up more than 3 page jumps in the login/authentication session.
// Typical error demonstration (redirection tracking not enabled)
curl -x http://代理IP:端口 http://example.com/login
At this point, the server may return a 302 status code, but your curl is stuck in place like a wooden stake. ipipgo's tech guy found out.Requests without the -L parameter have a 78% probability of losing critical data, especially when using dynamic proxy pools.
Three tricks to make curl obediently follow the jump
Tip #1: Activate Tracking Mode("Like walking a dog on a leash.)
curl -L -x http://用户名:密码@ipipgo proxy IP:port Destination URL
This -L parameter is curl's GPS navigation, encounter 301/302 status code will automatically chase to the new address. Note ipipgo proxy format with account password, do not learn some tutorials only write IP not authentication.
Tip #2: Header Information Barricade(acts like a normal browser)
curl -L -x http://ipipgo代理IP:端口
-H "User-Agent: Mozilla/5.0"
-H "Referer: https://上一级页面"
Target URL
Many websites will check the request header, using ipipgo's residential proxy IP with this disguise, the success rate can be mentioned from 40% to 90%+.
parameters | effect | recommended value |
---|---|---|
-max-redirs | Anti infinite jump | Recommended 5-8 times |
-connect-timeout | Connection timeout | 15 seconds is optimal. |
Record of actual combat pitfalls (lessons learned through blood and tears)
When testing with one of the free proxies last week, I was stuck on the verification page for 10 requests in a row. Switching to ipipgo'sLong-lasting static IPAfter, it turned out that it was a cookie that didn't come with the right one - it turns out that some sites jump with the cookie from the original request.
// Correct posture (save and use cookies)
curl -L -x http://ipipgo代理IP:端口
-c cookies.txt -b cookies.txt
Target URL
Here's a tawdry maneuver: use ipipgo'sIP geographic binding functionIf you match the proxy IP with the location of the server that jumps to the page, the response speed is directly 3 times faster.
Guidelines on demining of common problems
Q: The configuration is all right but still the jump fails?
A: Ninety percent of the proxy IP was the target site pulled black, hurry to change ipipgoHigh Quality Dedicated IPDon't use those crappy shared pools.
Q: I get a garbled code after the jump?
A: 80% encounter gzip compression, remember to add -compressed parameter:
curl -L --compressed -x http://ipipgo代理IP:端口 web site
Q: How can I confirm if the jump is successful?
A: Add the -v parameter to see the detailed process, focusing on these two places:
< HTTP/1.1 302 Found
< Location: https://跳转地址
One last cold tip: use ipipgo'sAPI automatically changes IPfunction, with the curl retry parameter, can realize the fully automatic jump tracking, the specific configuration scheme can find their family technology to ready-made scripts.