
Requests library agent setup hands-on instruction
When we use Python to do data collection, we often encounter the situation of website anti-crawl. At this time the proxy IP is a lifesaver. Take a real scenario: you want to capture the price of an e-commerce platform, dozens of consecutive visits after the IP is blocked. At this time in the requests request plus proxy parameters, immediately can be resurrected.
import requests
proxies = {
'http': 'http://用户名:密码@proxy.ipipgo.com:端口',
'https': 'http://用户名:密码@proxy.ipipgo.com:端口'
}
resp = requests.get('https://目标网站.com', proxies=proxies)
key reminder: Pay special attention to the username and password in the proxy format, many newbies will miss it!http://Prefix. If you use ipipgo's private proxy, remember to generate exclusive authentication information in the background, their dynamic IP survival time than other parents, measured to be able to use more 2-3 hours.
Dynamic Proxy Pool Tips and Tricks
A single proxy IP is easy to be recognized, we have to get a proxy pool to rotate. Here's a trick - use the Session object to maintain the session, while switching proxies randomly. Look at this code:
from requests.sessions import Session
import random
class SmartSession(Session).
def __init__(self, proxy_list).
__init__(self, proxy_list): super(). __init__()
self.proxy_pool = proxy_list This is where you put the multiple proxies provided by ipipgo.
def request(self, method, url, kwargs): super(. __init__().
kwargs['proxies'] = {'http': random.choice(self.proxy_pool)}
return super().request(method, url, kwargs)
Example of use
proxy_list = [
'http://ipipgo_user1:pass123@111.222.33.44:8000',
'http://ipipgo_user1:pass123@112.113.114.115:8000'
]
smart = SmartSession(proxy_list)
response = smart.get('https://需要采集的网站')
In this way, each request will randomly select a proxy, the site's wind control system will be difficult to identify. It is recommended to use ipipgo's dynamic residential proxy, their IP pool is updated every day with 200,000+ addresses, and the blocking rate is personally tested to be 60% lower than that of ordinary server room IPs.
Agent exception handling three axes
The most headache with the proxy is a variety of connection errors, here to teach you three sure-fire way:
1. Time-out retry mechanism
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
retry_strategy = Retry(
retry_strategy = Retry(
backoff_factor=1, status_forcelist=[500, 502, [1])
status_forcelist=[500, 502, 503]
)
adapter = HTTPAdapter(max_retries=retry_strategy)
session = requests.Session()
session.mount('https://', adapter)
2. Proxy validation
Ping the proxy server before each use to avoid sending requests with dead proxies. ipipgo's API can directly check the remaining traffic and IP status, which is much faster than traditional methods.
3. Exception logging
It is recommended to wrap the request code with try-except block and write down the proxy IPs that are out of order. ipipgo has a real-time monitoring panel in the background, so you can directly see which proxy nodes are slow to respond, making it easy to replace them in a timely manner.
Practical QA Q&A
Q:The proxy setting is successful but the request is still blocked by the website?
A: 80% of them are using low-quality transparent proxies, replace them with ipipgo's high stash proxies, and remember to check whether the X-Forwarded-For field in the request header exposes the real IP.
Q: What should I do if the agent is particularly slow?
A: First test speed to select nodes, ipipgo client comes with a delay test function. If you go HTTP proxy, you can enable streaming of requests:stream=Trueparameter boosts the speed of large file downloads.
Q: What if I need to use both domestic and overseas agents?
A: Specify the protocols in the proxies dictionary, for example, http proxies in China and https proxies overseas. ipipgo supports filtering of nodes by geography, which is directly added to the API parameters.country=usIt will be able to pull up US IPs.
Advanced Play: Agent Performance Optimization
Share a crushing trick for veteran drivers - boost throughput with connection pooling. Combined with ipipgo's Enterprise Proxy Package, the measured concurrency performance is increased by 4 times:
from requests.packages.urllib3.util.ssl_ import create_urllib3_context
Customize the SSL context
ctx = create_urllib3_context()
ctx.load_default_certs()
Create a session with connection pooling
session = requests.Session()
adapter = requests.adapters.HTTPAdapter(
adapter = requests.adapters.HTTPAdapter( pool_connections=50, pool_maxsize=100
pool_maxsize=100, max_retries=3
max_retries=3
)
session.mount('https://', adapter)
After this setting, the requests will reuse the TCP connection, which is especially suitable for the scenarios that require high frequency requests. Remember to turn on "Long Connection Mode" in ipipgo background, their proxy server supports keep-alive, which saves 30% handshake time compared to normal proxy.
Lastly, don't just look at the price when choosing a proxy service. Like ipipgo with intelligent routing technology, can automatically select the optimal line. The last time I did a competitive analysis, their Asian node response speed can be stabilized at 80ms or less, more than twice as fast as second-tier brands.

