IPIPGO ip proxy Node.js web crawler development: Node.js proxy crawler code example

Node.js web crawler development: Node.js proxy crawler code example

Crawlers are being counter-crawled? Try this trick of proxy IP Recently, many brothers engaged in Node.js crawler are complaining that the site anti-climbing more and more ruthless. The day before yesterday, an old man said, he wrote the crawler ran less than half an hour, the IP was blocked to death. This is something I feel too much, last year to do e-commerce data collection,...

Node.js web crawler development: Node.js proxy crawler code example

Crawlers being counter-crawled? Try this proxy IP trick

Recently, many Node.js crawler brothers are complaining that the site anti-climbing more and more ruthless. The day before yesterday, an old brother said, he wrote the crawler ran less than half an hour, the IP was blocked to death. This is something I feel too much, last year, when I did e-commerce data collection, we have to change the IP two or three days, and later found that the use of proxy IP is the true fragrance.

How exactly does a proxy IP help you

In a nutshell.Invisibility cloak for reptiles. Let's say you want to collect the price of a product from a website:

const axios = require('axios');

// Normal request (blocked in minutes)
async function normalRequest() {
  try {
    const response = await axios.get('destination URL'); console.log(response.data); // normal request (blocked in minutes); // normal request (blocked in minutes).
    console.log(response.data);
  } catch (error) {
    console.log('Damn, IP is blocked!'); } catch (error) { const response = await axios.get('Target website URL'); console.log(response.data); } catch (error) { console.log(response.data)) ); }
  }
}

After switching to a proxy IP:

// proxy request (recommended API with ipipgo)
const proxyConfig = {
  host: 'ipipgo Dynamic Residential Proxy IP',
  port: port number,
  auth: {
    username: 'Your account number',
    password: 'Random password'
  }
}.

async function proxyRequest() {
  try {
    const response = await axios.get('Target site URL', {
      proxy: proxyConfig, {
      timeout: 5000 // It's important to set a timeout.
    });
    console.log('Data arrived!') ;)
  } catch (error) {
    console.log('Change IP and continue'); } catch (error) { console.log('Change IP and continue') ; }
  }
}

Real-world code plays this way

recommendedAPI extraction methods for ipipgo, ten times more convenient than traditional proxy pools:

const { IpProxy } = require('ipipgo-sdk'); // official SDK
const puppeteer = require('puppeteer'); // Official SDK.

async function smartCrawler() {
  // Get the proxy IP dynamically (emphasis added!)
  const proxy = await IpProxy.getDynamicResidential({
    country: 'us', protocol: 'https'
    protocol: 'https'
  });

  const browser = await puppeteer.launch({
    args: [`--proxy-server=${proxy.ip}:${proxy.port}`]
  });

  // Remember to set the page timeout
  const page = await browser.newPage(); await page.goto(); }; // Remember to set the page timeout.
  await page.goto('Target URL', {timeout: 60000});

  // Randomly slide the mouse (to simulate a real person's action)
  await page.mouse.move(100, 100); await page.
  await page.waitForTimeout(2000);

  const data = await page.evaluate(() => {
    return document.querySelector('.price').innerText; }); const data = await page.evaluate((() => {
  }).

  await browser.close();
  return data; }); await browser.close(); return data; }
}

Concurrent processing beware

Use this routine when you need to have multiple crawlers on at the same time:

const { Worker } = require('worker_threads');

function createWorker(proxy) {
  return new Promise((resolve) => {
    const worker = new Worker('. /crawler.js', {
      workerData: { proxy }
    });

    worker.on('message', resolve); worker.on('error', () => { workerData: { proxy } }; }
    worker.on('error', () => {
      console.log(`${proxy.ip} hung, move to the next one`); }); worker.on('message', () => { worker.on('error', () => {
    });
  });
}

// Batch create proxy instances
const proxyList = await IpProxy.batchGet(10); // take 10 IPs at a time
const results = await Promise.all(proxyList.map(createWorker)); // take 10 IPs at a time.

Common pitfalls QA

Q: Why use a residential agent?
A: data center IP has long been blacklisted by major websites, residential IP looks like a real user. ipipgo's dynamic residential agent is a real home broadband, personally tested a certain East and a certain treasure can be run steadily.

Q: What is the cost-effective way to charge for a proxy IP?
A: Look at the business scenario to choose a package and save the price list:

Package Type Applicable Scenarios price of item
Dynamic residential (standard) Routine data collection 7.67 Yuan/GB/month
Dynamic Residential (Business) High-frequency visit requirements 9.47 Yuan/GB/month
Static homes Requires fixed IP scenarios 35RMB/IP/month

Q: How do I prevent account linkage?
A: Three steps: ① change different country IP for each request ② clear the browser fingerprints ③ with ipipgo's TK line to do account isolation.

Why ipipgo?

Used seven or eight agent service providers, the last long-term use of ipipgo on three reasons: ① their SERP API can directly climb Google data (others have to toss their own) ② three o'clock in the morning to find customer service actually seconds back ③ support for socks5 protocols, to engage in the handicraft scripts are also convenient. Recently found that they can also customize the hourly billing program, especially friendly to short-term projects.

Finally, a nagging word: although the proxy IP is good, but don't gripe people's websites to death. I've seen people open 100 threads to crawl, the result is that people's servers hang, this kind of bad thing we can not do.

This article was originally published or organized by ipipgo.https://www.ipipgo.com/en-us/ipdaili/41694.html

business scenario

Discover more professional services solutions

💡 Click on the button for more details on specialized services

New 10W+ U.S. Dynamic IPs Year-End Sale

Professional foreign proxy ip service provider-IPIPGO

Leave a Reply

Your email address will not be published. Required fields are marked *

Contact Us

Contact Us

13260757327

Online Inquiry. QQ chat

E-mail: hai.liu@xiaoxitech.com

Working hours: Monday to Friday, 9:30-18:30, holidays off
Follow WeChat
Follow us on WeChat

Follow us on WeChat

Back to top
en_USEnglish