
Hands-on with Puppeteer to put on a proxy IP
engaged in crawling friends know, Puppeteer this thing is good to use, but barefoot directly grabbing data sooner or later to fall. This time we have to invite ourproxy IPto serve as armor now, especially likeipipgoThis reliable service provider is a lifesaver against blocking.
Why do I have to use a proxy IP?
For example, you use your own broadband every day to grab data, the target site to see: "This IP again, give me to the death block!" If you use ipipgo's dynamic proxy pool at this time, each request will change a "vest", the other party can not even touch the hair. Measured data show that the single IP access frequency down to 1 time / minute, the sealing rate dropped 80%!
const puppeteer = require('puppeteer');
async function run() {
const browser = await puppeteer.launch({
args: [
'--proxy-server=http://username:password@ipipgo-proxy-server:port'
]
});
// Remember to replace the credentials with your ipipgo account credentials here.
const page = await browser.newPage(); await page.goto(''); // If you want to use the page, you have to use the page.
await page.goto('https://target-site.com');
}
Three Tips for Configuring Proxies
① Don't write the authentication information to death:It is recommended to use environment variables to store ipipgo account password, so that the code looks clean and not afraid of leakage.
②The timeout setting should be flexible:Proxy nodes in different regions have different response speeds. It is recommended to set a timeout threshold of 5-10 seconds.
③ Failure to switch automatically:To make a retry mechanism, encounter the failure of the IP immediately change the next, this can be configured in the management background of ipipgo directly.
A guide to common pitfalls
| symptomatic | method settle an issue |
| Browser gets stuck on startup | Check that the proxy format is correct, especially http and https. |
| Page loading elements are missing | Try adding the -disable-web-security startup parameter |
| Suddenly a large number of requests fail | Go to the backend of ipipgo and see if the remaining traffic is used up. |
QA time
Q: What should I do if I use a proxy but it makes me slower?
A: 80% of the nodes are chosen to be too far away geographically, you can filter servers with latency below 100ms in ipipgo's control panel.
Q: How do I get multiple browser instances open at the same time?
A: Just assign different proxies to each browser instance. ipipgo's API supports batch IP acquisition, so you can directly write a loop to get it done.
Q: What should I do if I come across a website CAPTCHA?
A: This is the time to use ipipgo'sResidential AgentsThis kind of IP looks no different from real users, with adjusting the mouse movement trajectory more realistic.
Why do you recommend ipipgo?
this oneDynamic residential agent poolIt's true flavor, and the real-world test ran for three days straight without triggering a validation. The most tawdry part is theirpay per volumeMode, small workshop with no pain in the silver. Stealing a trick: new users remember to register to receive 3G trial traffic, enough to measure a small project.
As a final rant, being a crawler is about afig. economy will get you a long wayThe first thing you need to do is to set up a reasonable request interval. Don't catch a site to death, set the request interval reasonably, with ipipgo's smart rotation strategy, in order to get data long and safely. If you find that the success rate suddenly plummeted one day, remember to check if it's time to renew your subscription (don't ask me how I know)...

