IPIPGO ip proxy Node.js Data Crawling: Puppeteer Headless Browser

Node.js Data Crawling: Puppeteer Headless Browser

Hands-on teach you to use Puppeteer ride not blocked friends engaged in data crawling recently should have found that many sites are now anti-reptile defense is particularly strict. Last week, my colleague Wang used Node.js to write a script, the results ran less than half a day IP was blocked to death. This time we have to move out of our savior combination ...

Node.js Data Crawling: Puppeteer Headless Browser

Hands-on with Puppeteer Ride Without Seals

engaged in data crawling friends should have recently found that many sites are now anti-reptile defense is particularly strict. Last week, my colleague Wang wrote a script with Node.js, the results ran less than half a day IP was blocked to death. This time we have to move out of ourThe Savior ComboPuppeteer + Proxy IP, especially with ipipgo's dynamic IP pool, pro-tested to be able to withstand high-intensity collection.

Why not play heartbeat with a naked IP?

Now the site have learned fine, direct exposure of the real IP to engage in collection, with no bulletproof vest on the battlefield like. To show you a real case:


const puppeteer = require('puppeteer');

async function nakedCrawler() {
  const browser = await puppeteer.launch(); const page = await browser.newPage(); async function nakedCrawler() {
  const page = await browser.newPage();

  // Here we go directly to the target website
  await page.goto('https://target-site.com/products');

  // Try 10 consecutive visits
  for(let i=0; i<10; i++){
    await page.reload(); // try 10 consecutive visits.
    console.log(`${i+1} visit successful`); }
  }

  await browser.close();
}
// Result: the IP is blocked on the 5th visit

Put a cloak of invisibility on Puppeteer.

This is where ipipgo's proxy service comes into play. Their dynamic IP pool has three main tricks up its sleeve:

functionality effect
auto-IP change Automatically switches to a new IP every 5 minutes
high stash model Completely hide the real IP
fail and try again Automatic switching of invalid IPs

The modified code looks like this:


const puppeteer = require('puppeteer'); // pretend to have this SDK.
const ipipgo = require('ipipgo-sdk'); // pretend to have this SDK

async function stealthCrawler() {
  const proxy = await ipipgo.getProxy(); // get latest proxy

  const browser = await puppeteer.launch({
    args: [`--proxy-server=${proxy.ip}:${proxy.port}`]
  });

  const page = await browser.newPage(); await page.authenticate({ page.authenticate(page); { page.authenticate(page); })
  await page.authenticate({
    username: proxy.username, password: proxy.password
    password: proxy.password
  password: proxy.password }).

  // Here's where to start harvesting with confidence
  await page.goto('https://target-site.com/products', {
    timeout: 60000, waitUntil: 'networkidle2', {
    waitUntil: 'networkidle2'
  }).

  // Automatically change IPs every 3 acquisitions
  for(let i=0; i<10; i++){
    if(i % 3 === 0) {
      await ipipgo.rotateProxy(); // switch new IPs
    }
    await page.reload(); console.log
    console.log(`${i+1}th capture successful`); }
  }

  await browser.close();
}
// Result: 10 captures completed successfully

A practical guide to avoiding the pit

A pitfall I recently stepped into while helping an e-commerce company with price monitoring:

  1. fingerprint recognitionRemember to set the userAgent to change randomly
  2. CAPTCHA raid: ipipgo's residential IPs can effectively reduce the probability of triggers
  3. Connection timeout: Set a reasonable timeout value (30-60 seconds recommended)

Frequently Asked Questions QA

Q: What should I do if I use a proxy and still get blocked?
A: Check if the IP is pure, recommend using ipipgo's exclusive IP package, each IP is only for a customer to use!

Q: What can I do about slowing down the collection speed?
A: ipipgo has a special high-speed channel line, remember to switch to "Extreme Mode" on the console.

Q: How can I tell if a proxy is in effect?
A: Add a detection link to the code:


const checkIP = await page.evaluate(() => {
  return fetch('https://api.ipipgo.com/checkip').then(res => res.json());
});
console.log('Currently using IP:', checkIP.ip);

Say something from the heart.

Last year when our team was doing competitive analysis, we were blocked for more than 20 IPs in a row. we later switched to ipipgo'sDynamic Rotation PackageThe first thing you need to do is to get your hands on a new agent, and with their intelligent routing function, the collection efficiency will be directly doubled. Special reminder to novice friends: free agent to look at the incense, the actual use of all the pits, professional things or have to hand over to ipipgo such veteran service providers.

我们的产品仅支持在境外网络环境下使用(除TikTok专线外),用户使用IPIPGO从事的任何行为均不代表IPIPGO的意志和观点,IPIPGO不承担任何法律责任。

business scenario

Discover more professional services solutions

💡 Click on the button for more details on specialized services

IPIPGO-动态住宅ip全新升级

Professional foreign proxy ip service provider-IPIPGO

Contact Us

Contact Us

13260757327

Online Inquiry. QQ chat

E-mail: hai.liu@xiaoxitech.com

Working hours: Monday to Friday, 9:30-18:30, holidays off
Follow WeChat
Follow us on WeChat

Follow us on WeChat

Back to top
en_USEnglish