
When curl encounters a 302 jump, how can a proxy IP help?
A lot of brothers do data capture have encountered this situation: with curl request a URL, the return HTTP status code is 302, the results of the data can not get dead. At this time we have to offer up the proxy IP this magic weapon, especially like ipipgo this kind of specialized in doing high-quality proxy service providers.
Normal request without proxy
curl http://example.com/login
The correct way to do this with the ipipgo proxy
curl -x http://用户名:密码@proxy.ipipgo.cc:2333 -L http://example.com/login
watch carefully-L parameterThis key switch, it is the switch that allows curl to automatically follow the 302 run. However, this is not enough, some sites will detect frequently requested IP, this time to rely on ipipgo's proxy pool toRotation of export IPsto avoid being blocked by the target site.
Four Steps to Real-World Configuration
Here's a really useful configuration scenario to show the guys (take python for example):
import requests
proxies = {
'http': 'http://user123:pass456@proxy.ipipgo.cc:2333',
'https': 'http://user123:pass456@proxy.ipipgo.cc:2333'
}
resp = requests.get('http://target.com',
proxies=proxies, allow_redirects=True, this is equivalent to curl-Links.
allow_redirects=True, this is equivalent to -L for curl
timeout=15)
Here's the point:
1. Proxy address should be filled in the three pieces given by ipipgo: account number, password and server address.
2. Do not exceed 20 seconds for the timeout setting, or you will be easily dragged to death.
3. If you encounter SSL certificate error, addverify=Falseparameters
Handbook on demining of common pitfalls
| symptomatic | cure |
|---|---|
| The loop jumps around and doesn't stop | Add -max-redirs 5 to curl command to limit the number of jumps |
| Proxy can't connect to the server | Check remaining traffic and expiration date of ipipgo backend |
| Return content garbled | Add -H "Accept-Encoding: gzip" request header |
A must-see QA session for beginners
Q: Do I still need to handle cookies myself after using ipipgo proxy?
A: Depends on the specific situation, it is recommended to use the Requests library's Session object to automatically manage it, which saves a lot of work than manual processing.
Q:Why is it still recognized by the website after I set up the proxy?
A: 80% of them are using transparent proxies, switch to ipipgo's high stash of proxy packages and get rid of those X-Forwarded-For headers.
Q: Do I need to change the proxy IP often?
A: If you use ipipgo, you don't have to change it manually, and their dynamic pool automatically switches the exit IP by default in 5 minutes, which is more convenient than tossing it yourself.
Say something from the heart.
Engaged in technology is afraid of tossing half a day does not solve the problem, I just started using curl to capture data, just 302 jumps on the card for three days. Later foundUsing a good proxy IP is the way to goThe most important thing is that if you have a retry mechanism like ipipgo, you can automatically cut the line when the target site is jerking around, which is much more reliable than writing your own retry code.
A final reminder:
1. use ipipgo's pay-as-you-go package during the testing phase, don't just buy a yearly subscription.
2. Important tasks remember to open dual-line backup, in the code with two proxy addresses
3. Remember to check the usage statistics every week, don't wait until the service is stopped to realize that the traffic has been overused.

