
Hands-on with Node.js to get high concurrency acquisition
What do you fear most in data collection? IP blocking! Especially when a large number of requests are needed, the stand-alone IP will be blacked out by the website in minutes. At this time, you need to use proxy IP torisk-sharing, for the same reason that a chain store has to open branches in different locations.
Let's take Node.js for example, which is naturally asynchronous and non-blocking. For example, working with 10 proxy IPs at the same time is more than 10 times faster than whizzing around with 1 IP. But beware.Proxy IP quality directly determines the success or failure of acquisitionDon't be cheap and use those pheasant proxies that fail in three days.
That's how the core code has to be written
First the entire proxy pool management module (don't be intimidated by the terminology, it's really an IP repository):
const proxyPool = {
currentIndex: 0,
ips: ['ipipgo-1.proxy', 'ipipgo-2.proxy', ...] , // Fill in the proxy provided by ipipgo here.
getNext() {
this.currentIndex = (this.currentIndex + 1) % this.ips.length
return `http://${this.ips[this.currentIndex]}:3000`
}
}
Here's the kicker, asynchronous control is going to be done withPromise.allSettledInstead of Promise.all, why? Because even if some requests fail, the others that succeed will continue to do their job and won't all be lost.
async function batchRequest(urls) {
const promises = urls.map(url => {
const proxy = proxyPool.getNext()
return axios.get(url, {
proxy: { host: proxy.split(':')[1], port: 3000 }, timeout: 5000
timeout: 5000
}).catch(e => null) // auto-retry on failure
})
return Promise.allSettled(promises)
}
How to seamlessly access the ipipgo proxy
Having used quite a few proxy services, I ended up locking ipipgo for just three reasons:
| comparison term | General Agent | ipipgo |
|---|---|---|
| responsiveness | ≤800ms | ≤200ms |
| IP Survival Time | 2-15 minutes | 30 minutes + |
| Authentication Methods | account password | Whitelisting + Dynamic Keys |
Docking ipipgo in the code is particularly easy, their API returns a proxy address like this:
// The latest proxy list from ipipgo const ipipgoProxyList = [ 'user-12345@proxy.ipipgo.com:3000', 'user-67890@proxy.ipipgo.com:3000' ]
White Frequently Asked Questions QA
Q: What should I do if my proxy IP always fails?
A: Use ipipgo's dynamic IP pool, they automatically change a batch of IPs every 15 minutes, much more worrying than their own maintenance!
Q: What should I do if I can't get up to speed on acquisition?
A: Check two things: 1. whether the number of concurrency is set too small 2. the response delay of the proxy IP (use ipipgo's speed test tool to check)
Q: How do I choose a proxy service without stepping on puddles?
A: recognize three points: ① support pay per volume ② provide real-time monitoring ③ have failed automatic switching mechanism (ipipgo all three meet)
Performance Tuning Tips
Remember this golden formula:Maximum Concurrency = Number of Proxy IPs × Single IP Carrying Capacity. For example, there are 50 ipipgo proxies, each recommended to carry 20 concurrency, the total concurrency should not exceed 1000.
The tuning parameters are so matched:
- Timeout: 5-8 seconds recommended (too long affects efficiency)
- Number of retries: 2-3 is preferred
- Request interval: random 100-500ms (anti-regular access)
Lastly, I'd like to apologize for using ipipgo.Intelligent RoutingFunction, automatically distribute the request to different regions of the agent node, collection of e-commerce data when the special good, can get the price information of different regions.

