IPIPGO ip proxy Python Image Grabber: Batch Downloader

Python Image Grabber: Batch Downloader

If you are always blocked IP for image crawling, try this trick! Brothers engaged in network crawlers understand, batch under the picture of the biggest headache is the IP is blocked. In the morning also run a good script, the afternoon will give you a 403 Forbidden, this time we have to pull out the proxy IP this life preserver. Today we will use Python ...

Python Image Grabber: Batch Downloader

If you are always blocked by IP, try this trick, it's very effective!

Brothers engaged in network crawlers understand, batch under the picture of the biggest headache is the IP is blocked. In the morning, the script is still running well, but in the afternoon, it will give you a403 ForbiddenThis is the time to pull out the proxy IP this life preserver. Today we will use Python to get a picture downloader with a shield, with ipipgo's proxy service to escort.

Why is it cool to not use a proxy IP?

There are three main things to look for in a website against crawlers:Request frequency, IP traces, user characteristicsThe following is an example of this. Ordinary crawler with a fixed IP wildly send requests, like the same person every minute to smash the door 100 times, the security does not block you block who? Using a proxy IP is like knocking on the door with a different vest every time, so the security guards won't recognize you at all.


 Example of core configuration for proxy IPs
proxies = {
    'http': 'http://用户名:密码@gateway.ipipgo.com:9020',
    'https': 'http://用户名:密码@gateway.ipipgo.com:9020'
}

hand in hand with the environment

Install these essential libraries first (remember that it's faster to install them with the Tsinghua source):


pip install requests pillow retrying -i https://pypi.tuna.tsinghua.edu.cn/simple

Focusing on the ipipgo configuration doorway: get on their backend toAPI Extraction LinksSuggested choicesLong-lasting static IPpackage, this IP survives for a long time and is particularly suitable for crawling tasks that require continuous work.

Code is written in such a way as to resist blocking

Straight to the hard stuff. Look at this tape.Triple Protectionof the code:


from retrying import retry
import requests
from urllib.parse import urlparse

def download_img(url, save_path): headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)
    headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'}

     Get the proxy IP dynamically from the ipipgo interface
    proxy = requests.get("https://ipipgo.com/fetchproxy?type=json").json()

    @retry(stop_max_attempt_number=3)
    def _download().
        resp = requests.get(url, headers=headers,
                          proxies={"http": proxy['proxy']},
                          timeout=15)
        resp.raise_for_status()
        with open(save_path, 'wb') as f.
            f.write(resp.content)

    try.
        _download()
    except Exception as e.
        print(f "Download failed: {str(e)}, changing ipipgo's IP...")
        return False
    return True

Old Driver QA Time

Q: What should I do if the proxy IP suddenly doesn't work?
A: ipipgo's home IP pool has5 seconds auto switchingmechanism, just add a retry loop in the code. If you encounter a dead IP, their background can also manually refresh the node.

Q: How do I know if the proxy is in effect?
A: Add a detection logic in the code, visit http://ip.ipipgo.com/checkip before downloading to see if the returned IP is a proxy IP.

Q: What if I want to open a multi-threaded download?
A: ipipgo'sEnterprise PackageSupport simultaneous 500 IP concurrency, each thread with an independent proxy, remember to set the timeout to more than 30 seconds.

Pitfall Avoidance Guide Form

pothole method settle an issue
The IP was blocked too fast. Turn up the frequency of IP changes in the ipipgo backend
Image not loading fully Add selenium rendering and then download the
Validated by the site's man-machine Enabling IP Filtering for Server Rooms with ipipgo

Tell the truth.

Don't believe in those free proxies, not to mention the slow speed, may also contain Trojan horses. ipipgo I have used for more than half a year, the biggest benefit is thatIP address can be selectedIf you want to grab images from any region, you can choose any node. Recently they have a campaign, new users get 10G of traffic, fill in the promo code when you sign up!IMG2024You also get 5G more, enough to download tens of thousands of images.

One last nag: don't set the delay too low! Some sites intentionally slow down their response time, and setting a timeout of 10 seconds or less makes it easy to misinterpret. If you're using ipipgo, it's recommended to set theTimeout to 15-20 secondsThe success rate can go up by 30%.

This article was originally published or organized by ipipgo.https://www.ipipgo.com/en-us/ipdaili/35928.html

business scenario

Discover more professional services solutions

💡 Click on the button for more details on specialized services

Professional foreign proxy ip service provider-IPIPGO

Leave a Reply

Your email address will not be published. Required fields are marked *

Contact Us

Contact Us

13260757327

Online Inquiry. QQ chat

E-mail: hai.liu@xiaoxitech.com

Working hours: Monday to Friday, 9:30-18:30, holidays off
Follow WeChat
Follow us on WeChat

Follow us on WeChat

Back to top
en_USEnglish