
When the crawler meets the anti-climbing: it's better to take a hard line than a detour
Brothers who engage in data collection understand that the anti-climbing mechanism of the target site is like a thief. Recently, an e-commerce price comparison brother and I spit out: "I take axios to write the crawler script, at first it was good, the next day I blocked the IP!" In fact, this problem is particularly common, the site found that a large number of IP requests in a short period of time, the direct blackout is not negotiable.
This is where proxy IPs come in. The principle is simplyGive each request a new "vest".It's like having different people take turns going to the supermarket to inquire about prices. With ipipgo's service, which automatically switches IPs for each request, the site can't tell if it's a real person visiting or a machine collecting.
Axios Configuration Proxy in Three Steps
axios itself does not come with proxy functionality, you have to use http-proxy-middleware middleware. Install the dependencies first:
npm install axios http-proxy-middleware --save
Configuration example (focus on the proxy section):
const axios = require('axios');
const { createProxyMiddleware } = require('http-proxy-middleware');
const service = axios.create({
baseURL: 'https://target-site.com',
timeout: 5000, proxy: false
proxy: false // Must disable default proxy
});
// Proxy middleware configuration
const proxyOptions = createProxyMiddleware({
target: 'https://target-site.com',
changeOrigin: true,
router: function(req) {
// Get the dynamic proxy IP from ipipgo
return `http://${ipipgo.getProxyIP()}`;
}
});
// Mount to the axios instance
service.interceptors.request.use(proxyOptions);
A guide to HF collection to save your life
It's not enough to have an agent, you have to be strategic:
| pothole | prescription |
|---|---|
| IP switching too often | Use each IP for at least 30 seconds before switching |
| Requests are too regularly spaced | Random delay 1-5 seconds |
| Header features are too obvious | Browser fingerprinting library with ipipgo |
Special reminder: do not write a dead proxy IP in the code! It is recommended to use ipipgo's API to dynamically obtain, their IP pool is updated every day 8 million + addresses, the probability of being blocked can be reduced by 70%.
Real-world pit avoidance QA
Q: Proxy IP timeout when I use it?
A: eighty percent is using a free agent, it is recommended to change ipipgo exclusive line. Measured their response speed can be controlled within 200ms, much more stable than the public proxy.
Q: How can I tell if a proxy is in effect?
A: Add a log to the axios interceptor:
service.interceptors.request.use(config => {
console.log('Proxy currently in use:', config.proxy);
config; return config.
});
Q: What should I do if I encounter a CAPTCHA?
A: two ways: 1) reduce the collection frequency 2) use ipipgo's high stash of proxies, some of their IP segments with automatic CAPTCHA crack, pro-test effective.
The doorway to choosing a proxy service
The market is a mixed bag of agency services, to teach you a few tricks to avoid the pit:
- Look at the survival time: ipipgo's IP survives for 48 hours on average, and short-lived proxies can't handle high-frequency collection at all!
- Measure the connectivity: don't believe the advertisement said 99%, write your own script to measure, we have measured ipipgo's connectivity is really 97% or more!
- Than after-sales service: encounter problems can be responded to within 10 minutes is considered pass, this point ipipgo 7 × 24 online customer service is really reliable!
Finally, a big truth: proxy IP is not a panacea, with the request strategy to maximize the effect. Like cooking, fresh ingredients (proxy quality) and fire mastery (collection strategy) is indispensable. With ipipgo's services plus the skills mentioned in this article, the daily collection of millions of data is not a dream.

