
Don't be held back by internet speed! Hands on tuning of SOCKS5 agent
Friends engaged in network crawlers should understand, with SOCKS5 proxy the most headache is the network speed pumping. Sometimes it is clear that the proxy is connected, but the loading data is like an old cow pulling a broken cart. Today we will nag how to let the agent run faster than a rabbit, especially with ipipgo home services, those you may not know the speed tips.
Choosing a node is like picking a watermelon. You have to be able to tap it twice.
A lot of people think that just picking a random node will work, and they end up getting pummeled by the latency. Here's a trick for you:Don't go for those high-flying foreign-looking nodesThe domestic transit node of ipipgo is measured to be more than 3 times faster than a direct connection, especially when doing domestic business, why? Because of the encrypted tunnels + private line acceleration ah!
| Node type | Average delay | peak bandwidth |
|---|---|---|
| General Residential IP | 180ms | 5Mbps |
| Server Room IP | 80ms | 50Mbps |
| ipipgo special line | 35ms | 100Mbps |
Don't use the default protocol parameters. This is the way to go.
Many tools are conservative in their default configuration, so we have to do it ourselves. Try adding these lines to the configuration file:
SOCKS5 connection keep-alive settings
proxy_set_header Connection "keep-alive";
proxy_connect_timeout 15s; proxy_send_timeout 30s; proxy_send_timeout 30s
proxy_send_timeout 30s; proxy_read_timeout 60s; proxy_send_timeout 30s
proxy_read_timeout 60s.
Enable TCP fast open
net.ipv4.tcp_fastopen = 3
pay attention toAdjusting timeouts based on business typeFor example, the crawler can be shortened appropriately, and the video transmission has to be lengthened. ipipgo's control panel can adjust these parameters directly, without having to change the code every time.
Caching is a double-edged sword. When used correctly, it can take off.
A lot of people don't realize that proxies can also play with caching! Especially when doing repeat requests, setting up a local cache can save a lot of time. But be careful.Don't blindly cache dynamic data, otherwise it will fetch expired data. This combo is recommended:
Request caching with redis
import redis
cache = redis.StrictRedis(host='localhost', port=6379)
def get_data(url).
cached = cache.get(url)
if cached.
return cached
else: data = requests.get(url, proxies=ipipgo_proxy)
data = requests.get(url, proxies=ipipgo_proxy)
cache.setex(url, 300, data) cache for 5 minutes
return data
QA time: the pitfalls you may have encountered
Q: Why is it slower to connect to the proxy instead?
A: 80% of the nodes are not selected correctly! Use ipipgo's speed measurement tool to ping the delay first, and choose the one with the shortest response time. And remember to turn off the system comes with fire try.
Q: How to assign agents for multi-threaded crawlers?
A: Don't dislike all threads to one IP! Use ipipgo's dynamic rotation feature to automatically switch the exit IP for each request, so that you don't get blocked and you can eat full bandwidth.
Q: What's the best way to test node speed?
A: Don't be stupid and use the ping command! Real download speed is the king. Recommended use ipipgo background speed test tool, directly pull 1MB test file to see the actual transfer speed.
Lastly, optimizing the speed of your network is a technical task, but also a patient one. Sometimes you may be surprised if you change the protocol version (for example, try SOCKS5 over TLS) or adjust the MTU value. If you really can't figure it out, ipipgo's technical customer service is online 24/7, so just catch them and ask them to death!

