
Python meets cURL: an alternative way to play with proxy IPs
Crawler brothers should have encountered anti-climbing mechanism, right? The kind of data in front of the eyes but can not get the taste, like hot pot in front of the chopsticks do not give. At this time, the proxy IP is your "chopsticks", especially with cURL this old tool, you can play a lot of new tricks.
Why cURL Binding Library?
Many people feel that the requests library is enough, but when it comes to the need for fine-grained control of the request scenario (such as setting up a specific transport protocol), the underlying control of cURL comes in handy. Let's take a chestnut:
import pycurl
from io import BytesIO
buffer = BytesIO()
c = pycurl.Curl()
c.setopt(c.URL, 'http://example.com')
c.setopt(c.WRITEDATA, buffer)
The key is in this line ↓↓↓
c.setopt(c.PROXY, 'http://username:password@proxy.ipipgo.com:8080')
c.perform()
c.close()
watch carefullyusername:password@proxy addressThis format, many newbies planted in this, ipipgo proxy authentication must be filled out strictly in accordance with this format.
Practical Solutions for Dynamic Proxy Pools
Single IP is easy to be blocked, we need to get an IP pool. Use ipipgo's API to get IPs, with cURL'sCURLOPT_PROXYOptions that can be played like this:
def get_ip().
Here we call the ipipgo API.
return requests.get('https://api.ipipgo.com/getip?type=json').json()['proxy']
def curl_with_rotation(url):: for _ in range(3): fail 3 times.
for _ in range(3): fail and retry 3 times
try.
proxy = get_ip()
c = pycurl.
c.setopt(c.PROXY, proxy)
Other configurations...
return True
except pycurl.error as e: print(f "IP {proxy}")
print(f "IP {proxy} hanged, move to the next one")
return False
Guide to avoiding pitfalls (tabular version)
| pothole | symptomatic | prescription |
|---|---|---|
| Authentication format error | Return 407 error | Confirm that the account password in the ipipgo backend is with special characters or not |
| Connection timeout | CURLE_OPERATION_TIMEOUT | Set CONNECTTIMEOUT before the pycurl.TIMEOUT parameter |
| SSL Authentication Failure | SSL Certificate Error | set c.setopt(pycurl.SSL_VERIFYPEER, 0) |
QA time
Q: What should I do if my proxy IP fails frequently?
A: It is recommended to use ipipgo's pay-per-volume package, their survival rate can go up to 98% or more, much more stable than free IP.
Q: How do I configure a high anonymity proxy if I need one?
A: Choose the "Privacy Proxy" type in the ipipgo backend, no extra settings are needed in the code, their exit will automatically erase the X-Forwarded-For header.
Q: Why is the response time fast and slow?
A: Check if you are mixing proxies from different regions, it is recommended to create proxy groups in the same region in the ipipgo console to avoid cross server room delays.
Cold Tricks of the Trade
1. Debugging tool: setupc.setopt(c.VERBOSE, True)The full request header information can be seen in the
2. Connection multiplexing: settingsc.setopt(c.FORBID_REUSE, False)Can boost the performance of 201TP by about 3T
3. Accurate timeout: set different timeouts for different operations
c.setopt(c.CONNECTTIMEOUT, 5) Connection timeout
c.setopt(c.TIMEOUT, 15) overall timeout
Lastly, I'd like to say a few words about proxy IPs: stability is more important than anything else. The time cost of tossing free proxies on your own is enough to buy a professional service for several years. The likes of ipipgo can provideAPI real-time extraction+automatic forensicsThe service is the right way to open our programmers.

