
Teach you how to use NodeJS + proxy IP to do website crawling
Recently, many brothers asked me to use NodeJS to capture the website is always blocked IP how to do? Today we will talk about this matter. First, let's get to the point.Proxy IPs are definitely a life-saver against blocking!, especially professional service providers like ipipgo, who have IP pools as big as rice vats and are so silky smooth to use.
Why do I have to use a proxy IP?
To cite a chestnut, you go to the supermarket to grab special eggs, if you go to 800 times a day, the security guards do not stop you to stop who? This is also true for web servers. Proxy IP with ipipgo is like changing different vests to purchase, every time you change the IP address, the server will not recognize you.
const axios = require('axios');
const cheerio = require('cheerio');
// Replace this with your own ip ipgo proxy address
const proxyConfig = {
host: 'gateway.ipipgo.com', port: 9021,
host: 'gateway.ipipgo.com', port: 9021, auth: {
auth: {
username: 'Your account', password: 'Your password', {
password: 'Your password'
}
}.
async function grabData(url) {
try {
const response = await axios.get(url, {
proxy: proxyConfig
}); const $ = cheerio.load(response.data)
const $ = cheerio.load(response.data);
// Crawl logic is written here...
} catch (error) {
console.log('Crawl error:', error.message); }
}
}
Cheerio parses the triple axe
When you get a web page, you have to disassemble the data, right? Cheerio is like scissor paste, and it works like a charm. There are three key things to remember:
// 1. Find the fixed logo
const price = $('div.price-box span').text();
// 2. Locate by attribute
const stock = $('[data-type="inventory"]').attr('data-count');
// 3. Iterate through the list
$('ul.product-list li').each((index, element) => {
const title = $(element).find('h3').text();
});
ipipgo real-world tips
Their agent has a specialty--Automatic IP change.. Add a random interval to the code and the success rate is directly doubled:
function randomDelay() {
return Math.floor(Math.random() 3000) + 1000;
}
async function safeGrab(url) {
await new Promise(resolve => setTimeout(resolve, randomDelay())); } async function safeGrab(url) { return Math.floor(Math.random()) + 1000; }
return grabData(url);
}
Common Rollover Scene QA
Q: Why am I still blocked even though I use a proxy?
A: Eighty percent of the IP quality is not good, free proxy with the roadside stalls like, may be when the scurry thin. It is recommended to use ipipgo's exclusive IP, dedicated to a person without serial number.
Q: What can I do if I can't catch all the data?
A: First check if the anti-climbing mechanism is triggered, try to add these headers:
headers: {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0) the proper browser',
'Accept-Language': 'zh-CN,zh;q=0.9'
}
Guide to avoiding the pit
| pothole | method settle an issue |
|---|---|
| Excessive frequency of requests | Add random delays, controlled at 3-5 seconds per pass |
| HTML structural changes | Regularly checking the selector, underlined by try-catch |
| CAPTCHA interception | Use with ipipgo's Residential Proxy IPs |
Lastly, to put it into perspective, catching data is a lot like fishing.Patience + good toolsOne is indispensable. ipipgo has recently been doing activities, new users to send 10G traffic, enough for you to toss for a while. Encounter specific problems can be directly call their technical customer service, the response speed than the delivery boy faster.

