
Hands-on with HTTPX asynchronous requests to play with proxy IPs
Recently in the crawler group to see a lot of partners complained that the use of requests to do data collection always be blocked IP. today we change a new weapon - HTTPX library, this thing asynchronous request function is good to use, with the proxy IP is simply like a tiger to add wings. Let's use our own proxy service ipipgo to demonstrate, hand in hand to teach you how to avoid anti-climbing mechanism.
HTTPX Basic Operation Triple Strike
Let's install a library first:pip install httpxThe basic usage is similar to requests, but with more asynchronous support. The basic usage is similar to requests, but with more asynchronous support. Look at this code:
import httpx
Normal GET request
with httpx.Client() as client:
response = client.get('https://example.com')
print(response.status_code)
Adding proxies to the pose (focus here!)
proxies = "http://用户名:密码@gateway.ipipgo.com:9021"
response = httpx.get("https://ip.ipipgo.com", proxies=proxies)
print(f "Current IP: {response.json()['ip']}")
Note that the proxy address in thegateway.ipipgo.comIt's our service entrance, the port is different for different packages. The advantage of using your own service is that the IP pool is large enough to change automatically without having to worry about it.
The right way to open an asynchronous request
Synchronous requests can kill you in a hurry when you have to grab them in bulk. Go asynchronous! Look at this god operation:
import asyncio
import httpx
async def fetch(url):
async with httpx.AsyncClient(
proxies="http://user:pass@gateway.ipipgo.com:9021"
) as client: resp = await client.get(url)
resp = await client.get(url)
return resp.text
100 requests at the same time without lagging
urls = ["https://example.com/page/{}".format(i) for i in range(100)]
results = asyncio.run(asyncio.gather([fetch(url) for url in urls]))
The ipipgo's are used hereLong-term agency packages, especially for this high-frequency request scenario. Remember to use an asynchronous client, the regular client will drag its feet.
Proxy IP practical guide to avoid pitfalls
A few pitfalls that are often encountered in actual development:
| problematic phenomenon | prescription |
|---|---|
| Connection timeout | Switching ipipgo's different server room nodes |
| Return 407 error | Check if the account password has special characters |
| slow response time | Enabling link multiplexing for HTTPX |
It is recommended to add a retry mechanism in the code with ipipgo'sAutomatic IP switchingThe functionality is more hassle-free. Their API supports automatic IP switching by number of failures, which is especially friendly for doing large-scale collection.
QA Time: Summary of High Frequency Questions
Q: What should I do if the agent is not working when I use it?
A: It is recommended to use ipipgo's Dynamic Residential IP package, which automatically changes IP for each request, and simply does not give the other party a chance to block.
Q: Asynchronous request suddenly stuck and not moving?
A:Check if the timeout parameter is not set, HTTPX default infinite wait. Add the timeout=30 parameter, then it will be stable.
Q: What if I need a high anonymous proxy?
A: Go directly to ipipgo'sEnterprise-level agency servicesThe request header does not expose the proxy characteristics at all, and has been tested by the strict inspection of a certain East.
Ultimate Configuration Program
One last biggie, here's my go-to configuration template:
client = httpx.AsyncClient(
proxies={
"http://": "http://user:pass@gateway.ipipgo.com:9021",
"https://": "http://user:pass@gateway.ipipgo.com:9021"
},
timeout=30.0,
limits=httpx.Limits(max_connections=100),
headers={"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"}
)
With this configuration, it's no pressure to do millions of requests with ipipgo's proxies. Their IP pool is updated frequently enough that you basically won't encounter CAPTCHA bombing. Finally, to remind, do data collection to comply with the rules of the site, the use of proxy is not for sabotage ha.

