IPIPGO ip proxy Scrapy set proxy ip: Scrapy crawler project proxy IP configuration details

Scrapy set proxy ip: Scrapy crawler project proxy IP configuration details

Teach you how to put on the proxy armor in Scrapy crawl brother understand, not with a proxy is like a naked run online, a minute by the site blocked IP. today we take Scrapy to open the knife, say how to give it to wear a good proxy armor. Here we use our own proxy service ipipgo as an example, and it is effective and does not involve any falsehoods. Sc...

Scrapy set proxy ip: Scrapy crawler project proxy IP configuration details

Hands on with putting an agent vest in Scrapy

Crawler brothers understand, not with a proxy is like a naked Internet, minutes by the site blocked IP. today we take Scrapy to open the knife, say how to give it to wear a good proxy vest. Here we use our own proxy service ipipgo example, pro-test effective not to pull false.

Scrapy's Three Axes of Proxy Configuration

Let's start with the most straightforward configuration method for the newbie:


 Add the material in settings.py
DOWNLOADER_MIDDLEWARES = {
    'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 543,
}

 Plug the proxy into the specific request
yield scrapy.Request(
    url, meta={'proxy': 'proxy')
    meta={'proxy': 'http://username:password@proxy.ipipgo.com:8000'}
)

this kind ofHard coding methodIt is suitable for temporary testing, if you use it for a long time, you have to change a smart method. In practice, I found that writing a dead proxy directly in settings is easy to be targeted by anti-crawling mechanisms.

Dynamic Proxy Pools are King

Advanced players use rotating proxies, here we recommend using ipipgo's API to get it dynamically:


import random

class ProxyMiddleware.
    def process_request(self, request, spider): proxy_list = get_ipipgo_proxies() call ipipgo API interface.
        proxy_list = get_ipipgo_proxies() call ipipgo API interface
        proxy = random.choice(proxy_list)
        request.meta['proxy'] = f "http://{proxy['ip']}:{proxy['port']}"
        request.headers['Proxy-Authorization'] = basic_auth_header(
            proxy['user'], proxy['password']
        )

Take care of it.Proxy Failure Auto SwitchingI suggest to add a retry mechanism in the exception handling. ipipgo's API is very responsive, it takes milliseconds to get a new proxy.

The Doorway in the Configuration File

Older drivers are doing their work in settings.py and recommending configuration packages:

configuration item recommended value
CONCURRENT_REQUESTS Adjusted for proxy packages (30-50 recommended for dynamic proxies)
DOWNLOAD_TIMEOUT Setting 15-30 seconds is safer
RETRY_TIMES Suggest 3 retries to stay safe

Record of actual pitfalls

I encountered the most pitiful situation: the agent obviously works, but the crawler just can not connect. Later, I realized that it wasSSL authenticationDamn, adding this parameter to the request solves it immediately:


request.meta['download_timeout'] = 30
request.meta['proxy'] = 'https://...'   Note the protocol type
request.meta['dont_redirect'] = True Prevents redirects from dropping proxy

Frequently Asked Questions First Aid Kit

Q: What should I do if the proxy suddenly fails?
A: Add exception capture in the middleware to automatically pull new agents from ipipgo. It is recommended to turn on the proxy health check and kick out the pool in time if it is ruined.

Q: Crawl like a turtle?
A: Check the type of proxy package. Dynamic Residence (Enterprise Edition) is 30% faster than Standard Edition, if you have enough budget to go directly to Static Residence, the speed will fly.

Q: Always encounter CAPTCHA?
A: Change to use ipipgo's TK special line proxy, this residential IP is less likely to trigger the verification. The real test after using this dedicated line verification code appearance rate dropped 70%.

How to choose a ipipgo package

Personal recommendation package comparison:

  • Small-scale crawlers: dynamic residential (standard) 7.67 yuan / GB, save enough to make use of the
  • Enterprise-level projects: directly on the static residential 35 yuan / IP, stable and not tossed!
  • Special needs: cross-border line against geographical restrictions on the website, who uses who knows

Finally said a hollow: proxy configuration is not a one-off thing, according to the target site's anti-climbing strategy flexible adjustment. Brothers with ipipgo remember to live with their customized services, technical customer service can help to adjust the reference, than their own blind toss much stronger.

This article was originally published or organized by ipipgo.https://www.ipipgo.com/en-us/ipdaili/43747.html

business scenario

Discover more professional services solutions

💡 Click on the button for more details on specialized services

New 10W+ U.S. Dynamic IPs Year-End Sale

Professional foreign proxy ip service provider-IPIPGO

Leave a Reply

Your email address will not be published. Required fields are marked *

Contact Us

Contact Us

13260757327

Online Inquiry. QQ chat

E-mail: hai.liu@xiaoxitech.com

Working hours: Monday to Friday, 9:30-18:30, holidays off
Follow WeChat
Follow us on WeChat

Follow us on WeChat

Back to top
en_USEnglish