
Hands-on with NodeJS to break through anti-crawl limitations
engage in website collection of old drivers understand, now more and more sites with server-side rendering (), directly with the traditional crawler simply can not pick up effective data. This time we have to sacrifice NodeJS this weapon, with our ipipgo proxy IP services, specifically to deal with this difficult to gnaw bones.
Let's take a real scenario: price monitoring of an e-commerce platform. With ordinary requests to get are empty shell pages, the key data are rendered on the server side. At this point, you have to use theHeadless BrowserSimulate the operation of real people, but frequent access to the iron will trigger the ban. Last year we tested, single IP access more than 20 times / minute, 100% trigger CAPTCHA.
const puppeteer = require('puppeteer');
const {getProxy} = require('ipipgo-sdk'); // Remember to install the official SDK.
async function ssrCrawler(url) {
const proxy = await getProxy({type: 'https'}); // Automatically fetch the fresh IP address.
const browser = await puppeteer.launch({
args: [`--proxy-server=${proxy.ip}:${proxy.port}`]
});
// Fake the real browser fingerprint
await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36...') ...')
await page.authenticate({
username: proxy.username, password: proxy.password
password: proxy.password
}); await page.authenticate({ username: proxy.username, password: proxy.password)
// Here's where the page begins to operate normally...
}
Proxy IP Selection with Care
Proxy services on the market are mixed, especially to do server-side rendering collection, these three pits must not step on:
| typology | Applicable Scenarios | ipipgo program |
|---|---|---|
| Data Center IP | General Data Capture | static IP pool |
| Residential IP | high impact crawling website | dynamic rotation |
| Mobile IP | APP Data Collection | 4G network pool |
Focusing on residential agents, ipipgo'sIntelligent RoutingThe technology is really fragrant. Last week to help customers do a ticket site collection, the same task automatically switch different regional IP, the success rate from 37% directly soared to 89%. specific configuration see here:
const ipipgo = require('ipipgo');
const client = new ipipgo.Client('your API key');
// Get region-specific IPs on demand
const proxy = await client.getProxy({
country: 'us', city: 'los_angeles'
city: 'los_angeles', protocol: 'socks5', 'socks5', 'socks5', 'socks5'
protocol: 'socks5'
});
A practical guide to avoiding the pit
Five common low-level mistakes newbies make:
- No timeout set (3-10 seconds randomization recommended)
- Cookies are not isolated (separate environments for different IPs).
- Headers are too clean (remember to bring Referer and Accept-Language)
- IP switching too regular (random intervals + random regions)
- Doesn't handle CAPTCHA (suggests integrating third-party recognition services)
Focusing on the third point, the HEADERS configuration is going to play out this way:
const headers = {
'Accept-Encoding': 'gzip, deflate, br', // mixes for more authenticity
'Accept-Language': 'zh-CN,zh;q=0.9,en;q=0.8', // Mix and match for greater authenticity
'Pragma': 'no-cache', // Randomly insert useless headers.
// Randomly insert useless headers
'X-Requested-With': Math.random() > 0.5 ? 'XMLHttpRequest' : null
};
question-and-answer session
Q: What should I do if my proxy IP is slow?
A: Prioritize the ipipgo'sDedicated high-speed lanesThe measured latency can be controlled within 200ms. At the same time, adjust the maxSockets parameter of NodeJS, it is recommended to set it to more than 50.
Q: How can I tell if a proxy is in effect?
A: Add a detection logic to the code:
const checkIP = async () => {
const res = await axios.get('https://api.ipipgo.com/checkip');
console.log('Current exit IP:', res.data.ip);
}
Q: What should I do if I encounter Cloudflare protection?
A: three steps: 1. change the latest version of Chromium 2. open ipipgo's JS rendering agent 3. add mouse movement track simulation
One last crushing tip: take ipipgo'spay per volumerespond in singingPackage ModeCombined use. Use unlimited packages for daytime peak hours and per-volume billing for late-night big data runs, so you can save 40% on costs.

