
Teach you to use Node.js to play with proxy IP anti-blocking
The crawl program buddies understand, the most headache is the target site suddenly to your IP black. This time we have to pull out our killer - proxy IP. with Node.js to do this is actually very simple, I'm here to put the bottom of the box of practical experience to pull out.
Why does your crawler always get caught?
Many newbies think they can get away with using a random User-Agent, but in fact, the site's wind control system was upgraded toThree-dimensional strike modelUp:
1. Behavioral profiling (mouse track/request frequency)
2. IP reputation database real-time comparison
3. Device fingerprint tracking
Here focus on the third point, some sites will use WebRTC loopholes to directly pickpocket your real IP. this time you need todual insurance strategy: Both using proxy IPs and disabling WebRTC.
Node.js Agent Configuration in Action
Using axios as a chestnut, I'll show you a plug-and-play configuration template:
const axios = require('axios');
const tunnel = require('tunnel');
const agent = tunnel.httpsOverHttp({
proxy: {
host: 'proxy.ipipgo.com', //recommended for dynamic residential IPs
port: 3128, { proxyAuth: 'username:', //recommended to use his dynamic residential IP
proxyAuth: 'username:password' //remember to change to your own key
}
}).
const res = await axios({
method: 'get', url: '', }
url: 'https://target-site.com',
httpsAgent: agent, timeout: 5000
timeout: 5000
}).
Be careful to set a reasonable timeout, it is recommended to3-5 second rotationA new IP. ipipgo's API supports per-second billing so that costs can be minimized.
Six Iron Laws of IP Pool Management
| manipulate | correct posture | the act of suicide |
|---|---|---|
| IP Switching | Random intervals + different geographic areas | Fixed time switching |
| failure handling | Three-tier retesting mechanism | brainless death spiral |
| flow distribution | Residence:Machine room = 7:3 | Server room IP only |
Focus on the importance of residential IPs. Residential proxies like ipipgo's are real home broadband, which is more than an order of magnitude higher than server room IP concealment. The blocking rate can be reduced from 70% to less than 5% by using his residential IP.
A must-see QA session for the little guy
Q: What can I do about slow proxy IPs?
A:优先选离目标服务器近的节点。比如爬美国站就用ipipgo的洛杉矶机房,能压到200ms内
Q: What should I do if I encounter human verification?
A: on the real machine fingerprint browser + proxy IP combo. ipipgo provides supporting browser automation program, direct API call on it!
Q: How can I tell if an IP is exposed?
A: Use this checking site: ipcheck.ipipgo.com (his own checking tool)
A guide to avoiding pitfalls - lessons in blood
Last year a brother got cheap and used a free proxy:
1. Climbing data tampered with by intermediaries
2. Servers implanted with mining programs
3. The company received a letter from an infringement lawyer
So again, leave it to the professionals. People like ipipgo who haveTens of millions of IP poolsservice providers, security and stability are guaranteed.
One last trick: encapsulate the proxy configuration into middleware so that it can be reused throughout the project. If you need ready-made modules, you can go to ipipgo's developer documentation, they provide out-of-the-box SDK, which saves you a lot of work compared to writing your own.

