Scrapy set proxy ip: Scrapy crawler project proxy IP configuration details

Hands on with putting an agent vest in Scrapy

Crawler brothers understand, not with a proxy is like a naked Internet, minutes by the site blocked IP. today we take Scrapy to open the knife, say how to give it to wear a good proxy vest. Here we use our own proxy service ipipgo example, pro-test effective not to pull false.

Scrapy's Three Axes of Proxy Configuration

Let's start with the most straightforward configuration method for the newbie:


 Add the material in settings.py
DOWNLOADER_MIDDLEWARES = {
    'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 543,
}

 Plug the proxy into the specific request
yield scrapy.Request(
    url, meta={'proxy': 'proxy')
    meta={'proxy': 'http://username:password@proxy.ipipgo.com:8000'}
)

this kind ofHard coding methodIt is suitable for temporary testing, if you use it for a long time, you have to change a smart method. In practice, I found that writing a dead proxy directly in settings is easy to be targeted by anti-crawling mechanisms.

Dynamic Proxy Pools are King

Advanced players use rotating proxies, here we recommend using ipipgo's API to get it dynamically:


import random

class ProxyMiddleware.
    def process_request(self, request, spider): proxy_list = get_ipipgo_proxies() call ipipgo API interface.
        proxy_list = get_ipipgo_proxies() call ipipgo API interface
        proxy = random.choice(proxy_list)
        request.meta['proxy'] = f "http://{proxy['ip']}:{proxy['port']}"
        request.headers['Proxy-Authorization'] = basic_auth_header(
            proxy['user'], proxy['password']
        )

Take care of it.Proxy Failure Auto SwitchingI suggest to add a retry mechanism in the exception handling. ipipgo's API is very responsive, it takes milliseconds to get a new proxy.

The Doorway in the Configuration File

Older drivers are doing their work in settings.py and recommending configuration packages:

configuration item	recommended value
CONCURRENT_REQUESTS	Adjusted for proxy packages (30-50 recommended for dynamic proxies)
DOWNLOAD_TIMEOUT	Setting 15-30 seconds is safer
RETRY_TIMES	Suggest 3 retries to stay safe

Record of actual pitfalls

I encountered the most pitiful situation: the agent obviously works, but the crawler just can not connect. Later, I realized that it wasSSL authenticationDamn, adding this parameter to the request solves it immediately:


request.meta['download_timeout'] = 30
request.meta['proxy'] = 'https://...'   Note the protocol type
request.meta['dont_redirect'] = True Prevents redirects from dropping proxy

Frequently Asked Questions First Aid Kit

Q: What should I do if the proxy suddenly fails?
A: Add exception capture in the middleware to automatically pull new agents from ipipgo. It is recommended to turn on the proxy health check and kick out the pool in time if it is ruined.

Q: Crawl like a turtle?
A: Check the type of proxy package. Dynamic Residence (Enterprise Edition) is 30% faster than Standard Edition, if you have enough budget to go directly to Static Residence, the speed will fly.

Q: Always encounter CAPTCHA?
A: Change to use ipipgo's TK special line proxy, this residential IP is less likely to trigger the verification. The real test after using this dedicated line verification code appearance rate dropped 70%.

How to choose a ipipgo package

Personal recommendation package comparison:

Small-scale crawlers: dynamic residential (standard) 7.67 yuan / GB, save enough to make use of the
Enterprise-level projects: directly on the static residential 35 yuan / IP, stable and not tossed!
Special needs: cross-border line against geographical restrictions on the website, who uses who knows

Finally said a hollow: proxy configuration is not a one-off thing, according to the target site's anti-climbing strategy flexible adjustment. Brothers with ipipgo remember to live with their customized services, technical customer service can help to adjust the reference, than their own blind toss much stronger.

Scrapy set proxy ip: Scrapy crawler project proxy IP configuration details

Hands on with putting an agent vest in Scrapy

Scrapy's Three Axes of Proxy Configuration

Dynamic Proxy Pools are King

The Doorway in the Configuration File

Record of actual pitfalls

Frequently Asked Questions First Aid Kit

How to choose a ipipgo package

business scenario

Professional foreign proxy ip service provider-IPIPGO

Contact Us

Follow us on WeChat

Hands on with putting an agent vest in Scrapy

Scrapy's Three Axes of Proxy Configuration

Dynamic Proxy Pools are King

The Doorway in the Configuration File

Record of actual pitfalls

Frequently Asked Questions First Aid Kit

How to choose a ipipgo package

business scenario

Professional foreign proxy ip service provider-IPIPGO

Related articles

沃尔玛跨境开店代理IP配置：美国本土IP获取方案

2026国内IP代理全网评测：城市切换高匿代理IP价格对比

Lazada店铺被封和IP有关吗？IP纯净度自查与更换教程

跨境电商代理IP一个月要花多少钱？不同规模预算参考

速卖通用代理IP有用吗？规避风控的正确打开方式

eBay多账号运营代理IP方案：IP隔离与环境配置实操

Contact Us

Follow us on WeChat