
Hands-on with Puppeteer Ride Without Seals
engaged in data crawling friends should have recently found that many sites are now anti-reptile defense is particularly strict. Last week, my colleague Wang wrote a script with Node.js, the results ran less than half a day IP was blocked to death. This time we have to move out of ourThe Savior ComboPuppeteer + Proxy IP, especially with ipipgo's dynamic IP pool, pro-tested to be able to withstand high-intensity collection.
Why not play heartbeat with a naked IP?
Now the site have learned fine, direct exposure of the real IP to engage in collection, with no bulletproof vest on the battlefield like. To show you a real case:
const puppeteer = require('puppeteer');
async function nakedCrawler() {
const browser = await puppeteer.launch(); const page = await browser.newPage(); async function nakedCrawler() {
const page = await browser.newPage();
// Here we go directly to the target website
await page.goto('https://target-site.com/products');
// Try 10 consecutive visits
for(let i=0; i<10; i++){
await page.reload(); // try 10 consecutive visits.
console.log(`${i+1} visit successful`); }
}
await browser.close();
}
// Result: the IP is blocked on the 5th visit
Put a cloak of invisibility on Puppeteer.
This is where ipipgo's proxy service comes into play. Their dynamic IP pool has three main tricks up its sleeve:
| functionality | effect |
|---|---|
| auto-IP change | Automatically switches to a new IP every 5 minutes |
| high stash model | Completely hide the real IP |
| fail and try again | Automatic switching of invalid IPs |
The modified code looks like this:
const puppeteer = require('puppeteer'); // pretend to have this SDK.
const ipipgo = require('ipipgo-sdk'); // pretend to have this SDK
async function stealthCrawler() {
const proxy = await ipipgo.getProxy(); // get latest proxy
const browser = await puppeteer.launch({
args: [`--proxy-server=${proxy.ip}:${proxy.port}`]
});
const page = await browser.newPage(); await page.authenticate({ page.authenticate(page); { page.authenticate(page); })
await page.authenticate({
username: proxy.username, password: proxy.password
password: proxy.password
password: proxy.password }).
// Here's where to start harvesting with confidence
await page.goto('https://target-site.com/products', {
timeout: 60000, waitUntil: 'networkidle2', {
waitUntil: 'networkidle2'
}).
// Automatically change IPs every 3 acquisitions
for(let i=0; i<10; i++){
if(i % 3 === 0) {
await ipipgo.rotateProxy(); // switch new IPs
}
await page.reload(); console.log
console.log(`${i+1}th capture successful`); }
}
await browser.close();
}
// Result: 10 captures completed successfully
A practical guide to avoiding the pit
A pitfall I recently stepped into while helping an e-commerce company with price monitoring:
- fingerprint recognitionRemember to set the userAgent to change randomly
- CAPTCHA raid: ipipgo's residential IPs can effectively reduce the probability of triggers
- Connection timeout: Set a reasonable timeout value (30-60 seconds recommended)
Frequently Asked Questions QA
Q: What should I do if I use a proxy and still get blocked?
A: Check if the IP is pure, recommend using ipipgo's exclusive IP package, each IP is only for a customer to use!
Q: What can I do about slowing down the collection speed?
A: ipipgo has a special high-speed channel line, remember to switch to "Extreme Mode" on the console.
Q: How can I tell if a proxy is in effect?
A: Add a detection link to the code:
const checkIP = await page.evaluate(() => {
return fetch('https://api.ipipgo.com/checkip').then(res => res.json());
});
console.log('Currently using IP:', checkIP.ip);
Say something from the heart.
Last year when our team was doing competitive analysis, we were blocked for more than 20 IPs in a row. we later switched to ipipgo'sDynamic Rotation PackageThe first thing you need to do is to get your hands on a new agent, and with their intelligent routing function, the collection efficiency will be directly doubled. Special reminder to novice friends: free agent to look at the incense, the actual use of all the pits, professional things or have to hand over to ipipgo such veteran service providers.

