
Hands-on teaching you to use Puppeteer with a proxy IP
Brothers engaged in web crawling understand, now the site anti-climbing mechanism is more and more difficult. Last week, I helped customers to grab the e-commerce data, and was blocked more than a dozen IP, so angry that almost fell on the keyboard. At this time the proxy IP will come in handy, especially with Puppeteer headless browser, is simply a golden partner.
先说个真实案例:有个做比价网站的团队,每天要抓上千个商品页面。他们最开始用本地IP,结果不到3小时就被目标网站拉黑。后来换成ipipgo的动态住宅代理,The request success rate shot straight up from 35% to 92%, which is the value of proxy IP.
Why do I have to use a proxy IP?
Websites are now fitted with intelligent risk control systems that look at three main indicators:
| test dimension | Local IP Risk | Proxy IP Advantage |
|---|---|---|
| Request frequency | Single IP high frequency must seal | Multiple IP rotation sharing |
| geographic location | Fixed areas are easy to identify | Global Node Camouflage |
| Behavioral characteristics | Single browser fingerprint | Segregation of different environments |
Especially with Puppeteer this kind of browser will load JS, it is more likely to trigger the anti-climbing mechanism. Last week a customer did not hang the proxy, open the headless mode direct access, the results of theAutomated features were recognized in 10 minutes, the entire IP segment is blocked.
Hands-on configuration tutorial (focus here)
Hanging an agent in Puppeteer is really just two steps:
1. Install the necessary libraries (do not use cnpm, easy to get out of the way):
npm install puppeteer --save
2. Start the browser with the proxy parameter (ipipgo as an example):
const puppeteer = require('puppeteer');
async function run() {
const browser = await puppeteer.launch({
args: [
'--proxy-server=http://user:pass@gateway.ipipgo.com:9020'
]
});
// Follow up...
}
Here's a pitfall to note: ipipgo's proxy address format isgateway.ipipgo.com:port number, the authentication information is found in the console. It is recommended to store the account password in an environment variable, don't be stupid and write it to death in the code.
Common Rollover Scene QA
Q: What can I do if the agent can't connect?
A: First check the whitelist settings, if it is terminal IP authorization, remember to bind the server IP in the ipipgo background. if it is account secret authentication, pay attention to the special characters to be URL encoded.
Q: Why is the page loading slower?
A: Select nodes to look at the geographic location, such as catching the U.S. site with ipipgo's North American residential agent. Don't try to be cheap and use a free proxy, the speed is slow and unstable.
Q: How can I prevent fingerprint tracking?
A: ipipgo's advanced package with browser fingerprint camouflage, together with Puppeteer's STEALTH-PLUGIN plugin, has been personally tested to bypass Cloudflare detection.
My private configuration plan
Share a battle-tested parameter combination:
const browser = await puppeteer.launch({
headless: 'new', // use new version of headless mode
args: [
'--proxy-server=http://user:pass@gateway.ipipgo.com:9020',
'--disable-blink-features=AutomationControlled',
'--no-sandbox'
],
ignoreHTTPSErrors: true // skip certificate errors
});
Remember to set User-Agent in the page object, ipipgo's API can directly get the real UA list of each region. This configuration has been running for two weeks without being blocked, suitable for the need for long-term stable crawling scene.
What agent package should I choose?
Selected based on business needs:
- Short-term testing: pay-per-use with ipipgo, starting at $0.50/GB
- Long-term project: buy enterprise-grade dynamic residential IP with session hold support
- Difficult websites: get on their customized fingerprint browser package
Finally say a word from the heart: do not save the budget on the proxy IP. Before a customer greedy cheap with free proxy, the result of data hijacked by the intermediary, the site did not catch but leaked the user's data, lost a wife and soldiers. With ipipgo this regular service providers, expensive is expensive, but save heart security ah.

