IPIPGO ip proxy Node.js Data Crawling: Puppeteer Headless Browser

Node.js Data Crawling: Puppeteer Headless Browser

Hands-on teach you to use Puppeteer ride not blocked friends engaged in data crawling recently should have found that many sites are now anti-reptile defense is particularly strict. Last week, my colleague Wang used Node.js to write a script, the results ran less than half a day IP was blocked to death. This time we have to move out of our savior combination ...

Node.js Data Crawling: Puppeteer Headless Browser

Hands-on with Puppeteer Ride Without Seals

engaged in data crawling friends should have recently found that many sites are now anti-reptile defense is particularly strict. Last week, my colleague Wang wrote a script with Node.js, the results ran less than half a day IP was blocked to death. This time we have to move out of ourThe Savior ComboPuppeteer + Proxy IP, especially with ipipgo's dynamic IP pool, pro-tested to be able to withstand high-intensity collection.

Why not play heartbeat with a naked IP?

Now the site have learned fine, direct exposure of the real IP to engage in collection, with no bulletproof vest on the battlefield like. To show you a real case:


const puppeteer = require('puppeteer');

async function nakedCrawler() {
  const browser = await puppeteer.launch(); const page = await browser.newPage(); async function nakedCrawler() {
  const page = await browser.newPage();

  // Here we go directly to the target website
  await page.goto('https://target-site.com/products');

  // Try 10 consecutive visits
  for(let i=0; i<10; i++){
    await page.reload(); // try 10 consecutive visits.
    console.log(`${i+1} visit successful`); }
  }

  await browser.close();
}
// Result: the IP is blocked on the 5th visit

Put a cloak of invisibility on Puppeteer.

This is where ipipgo's proxy service comes into play. Their dynamic IP pool has three main tricks up its sleeve:

functionality effect
auto-IP change Automatically switches to a new IP every 5 minutes
high stash model Completely hide the real IP
fail and try again Automatic switching of invalid IPs

The modified code looks like this:


const puppeteer = require('puppeteer'); // pretend to have this SDK.
const ipipgo = require('ipipgo-sdk'); // pretend to have this SDK

async function stealthCrawler() {
  const proxy = await ipipgo.getProxy(); // get latest proxy

  const browser = await puppeteer.launch({
    args: [`--proxy-server=${proxy.ip}:${proxy.port}`]
  });

  const page = await browser.newPage(); await page.authenticate({ page.authenticate(page); { page.authenticate(page); })
  await page.authenticate({
    username: proxy.username, password: proxy.password
    password: proxy.password
  password: proxy.password }).

  // Here's where to start harvesting with confidence
  await page.goto('https://target-site.com/products', {
    timeout: 60000, waitUntil: 'networkidle2', {
    waitUntil: 'networkidle2'
  }).

  // Automatically change IPs every 3 acquisitions
  for(let i=0; i<10; i++){
    if(i % 3 === 0) {
      await ipipgo.rotateProxy(); // switch new IPs
    }
    await page.reload(); console.log
    console.log(`${i+1}th capture successful`); }
  }

  await browser.close();
}
// Result: 10 captures completed successfully

A practical guide to avoiding the pit

A pitfall I recently stepped into while helping an e-commerce company with price monitoring:

  1. fingerprint recognitionRemember to set the userAgent to change randomly
  2. CAPTCHA raid: ipipgo's residential IPs can effectively reduce the probability of triggers
  3. Connection timeout: Set a reasonable timeout value (30-60 seconds recommended)

Frequently Asked Questions QA

Q: What should I do if I use a proxy and still get blocked?
A: Check if the IP is pure, recommend using ipipgo's exclusive IP package, each IP is only for a customer to use!

Q: What can I do about slowing down the collection speed?
A: ipipgo has a special high-speed channel line, remember to switch to "Extreme Mode" on the console.

Q: How can I tell if a proxy is in effect?
A: Add a detection link to the code:


const checkIP = await page.evaluate(() => {
  return fetch('https://api.ipipgo.com/checkip').then(res => res.json());
});
console.log('Currently using IP:', checkIP.ip);

Say something from the heart.

Last year when our team was doing competitive analysis, we were blocked for more than 20 IPs in a row. we later switched to ipipgo'sDynamic Rotation PackageThe first thing you need to do is to get your hands on a new agent, and with their intelligent routing function, the collection efficiency will be directly doubled. Special reminder to novice friends: free agent to look at the incense, the actual use of all the pits, professional things or have to hand over to ipipgo such veteran service providers.

This article was originally published or organized by ipipgo.https://www.ipipgo.com/en-us/ipdaili/33571.html

business scenario

Discover more professional services solutions

💡 Click on the button for more details on specialized services

New 10W+ U.S. Dynamic IPs Year-End Sale

Professional foreign proxy ip service provider-IPIPGO

Leave a Reply

Your email address will not be published. Required fields are marked *

Contact Us

Contact Us

13260757327

Online Inquiry. QQ chat

E-mail: hai.liu@xiaoxitech.com

Working hours: Monday to Friday, 9:30-18:30, holidays off
Follow WeChat
Follow us on WeChat

Follow us on WeChat

Back to top
en_USEnglish