
First, what the hell is PycURL?
Many people are confused when they hear about PycURL for the first time, but it's actually the Python version of the curl command. Just as you usually use curl to test interfaces, you can now do the same thing in Python. For example, when you want to batch test whether the proxy IP is alive or not, although it is simple to use the requests library, but when you encounter a scenario that requires fine control of network requests, PycURL is the real choice.
import pycurl
from io import BytesIO
buffer = BytesIO()
c = pycurl.Curl()
c.setopt(c.URL, 'http://ipipgo.com/checkip')
c.setopt(c.WRITEDATA, buffer)
c.perform()
print(buffer.getvalue().decode('utf-8'))
II. Putting a proxy vest on PycURL
Here's the kicker! The key to getting PycURL to go to a proxy IP is to be able to set the netting for these two parameters:CURLOPT_PROXYrespond in singingCURLOPT_PROXYUSERPWDThe first thing we recommend is to use ipipgo's proxy service. Here we recommend using ipipgo's proxy service, their family provides ready-made authentication format, directly fill in the username and password will work.
c = pycurl.Curl()
c.setopt(c.PROXY, 'proxy.ipipgo.com:9021') ipipgo's access address
c.setopt(c.PROXYUSERPWD, 'user123:pass456') Account password format
c.setopt(c.TIMEOUT, 10) It is important to set a timeout for the net.
Three major pitfalls of commissioning agents
Newbies often encounter these moths:
1. proxy address written on the wrong port (ipipgo's port starts with 9021)
2. forget to open the proxy authentication (must set PROXYUSERPWD)
3. did not deal with SSL certificate issues (add this line c.setopt(c.SSL_VERIFYPEER, 0))
Fourth, the actual combat: with ipipgo agent batch speed test
Here's a code template for a real scenario to check the responsiveness of a proxy IP:
def test_proxy_speed():
c = pycurl.Curl()
c.setopt(c.URL, 'http://speedtest.ipipgo.com')
c.setopt(c.PROXY, 'proxy.ipipgo.com:9021')
c.setopt(c.PROXYUSERPWD, 'user:pass')
Focus on logging time metrics
c.setopt(c.TIMEOUT, 15)
c.setopt(c.NOSIGNAL, 1)
try.
start = time.time()
c.perform()
return time.time() - start
except pycurl.error as e: print(f 'Hanging!
print(f 'Hanging! Error code: {e.args[0]}')
finally.
c.close()
V. QA session: what you might ask
Q: What should I do if the agent often fails to connect?
A: First check whether the account status is normal, ipipgo background can see real-time usage. Then try to switch the access area, sometimes a node temporary maintenance.
Q:Downloading large files is always interrupted?
A: Remember to setCURLOPT_LOW_SPEED_LIMITrespond in singingCURLOPT_LOW_SPEED_TIME, avoiding network fluctuation miscalculations.
Q: How do I get the proxy IP currently in use?
A: The X-Real-IP header returned to http://echo.ipipgo.com发请求 is the actual exit IP.
Six, cold skills: connection pool optimization
Remember to reuse Curl objects when using proxies at high frequency. It has been tested that using connection pooling can increase the speed by 3 times:
from threading import Lock
class CurlPool.
def __init__(self, size=5).
self.pool = [pycurl.Curl() for _ in range(size)]
self.lock = Lock()
def get_curl(self): with self.lock: [pycurl.
with self.lock: return self.pool.
return self.pool.pop()
def release(self, curl): with self.lock: return self.pool.pop()
curl.reset() Key step! Empty the state of the last request
self.pool.append(curl)
Lastly, I'd like to emphasize the quality of IPs when choosing a proxy service provider. The ones with automatic IP rotation like ipipgo are less likely to be banned when doing crawler projects. they also provide Python SDK, which is more convenient than using PycURL naked, so newbies can try it.

