
Can't get a website to crawl back? Try Proxy IP + Requests Authentication
When the guys are using Python to grab data, the biggest headache is to encounter the website anti-climbing mechanism. At this timeproxy IPIt's like putting a cloak on a crawler, and the authentication function of the requests library is the regulator of this cloak. Today we take ipipgo proxy service as a chestnut, hand in hand to teach you how to play this set of combinations.
Basic equipment: REQUESTS certification triple axe
First of all, you need to understand the authentication methods that come with requests, just like you need to familiarize yourself with the skill keys to play a game:
Basic Authentication Example
import requests
from requests.auth import HTTPBasicAuth
response = requests.get(
'https://需要认证的网址', auth=HTTPBasicAuth('Account', 'Auth')
auth=HTTPBasicAuth('Account', 'Password')
)
But that's not enough, many sites will recognize you as a crawler. That's when it's time to pull out oursecret weapon--ipipgo's dynamic proxy IP.
Real-world tips: putting a proxy vest on requests
ipipgo's proxy service supports two configurations, the choice depends on the specific needs:
Single request configuration (flexible version)
proxies = {
'http': 'http://用户名:密码@proxy.ipipgo.com:端口',
'https': 'http://用户名:密码@proxy.ipipgo.com:端口'
}
response = requests.get('destination URL', proxies=proxies)
Global Configuration (the save version)
session = requests.Session()
session.proxies.update({
'http': 'http://用户名:密码@proxy.ipipgo.com:端口',
'https': 'http://用户名:密码@proxy.ipipgo.com:端口'
})
response = session.get('destination URL')
Look out for aHidden Tips: ipipgo's proxy server address should be with an account password in the format ofUsername:Password@Proxy Address:PortDon't do it in the wrong order. Don't get it in the wrong order or it's like putting the key in the door lock backwards and it won't open.
Common Rollover Scene QA
Q: Why do proxy IPs fail when I use them?
A: may encounter IP is blocked, it is recommended to change to ipipgo's dynamic residential agent, their IP pool is automatically updated every hour, more than ordinary agents to resist the manufacture of
Q: What should I do if I am still recognized by the website after setting up the proxy?
A: Check if the request header has browser fingerprints, it is recommended to use fake_useragent library to disguise it. ipipgo's high stash proxy itself will erase these traces of X-Forwarded-For
Q: What should I do if my agent is slow as a snail?
A: Try ipipgo's exclusive bandwidth packages, or check if the target site itself loads slowly. You can usetimeoutParameterize timeout to avoid jamming
Upgrade Play: Automatically Switching Agent Pools
Older drivers do it this way, in conjunction with ipipgo's API to dynamically get proxies:
import requests
from itertools import cycle
def get_ipipgo_proxies()::
This calls the ipipgo API to get the latest list of proxies.
return [
'http://用户1:密码1@proxy1.ipipgo.com:端口',
'http://用户2:密码2@proxy2.ipipgo.com:端口'
]
proxy_pool = cycle(get_ipipgo_proxies())
for _ in range(10).
current_proxy = next(proxy_pool)
try: current_proxy = next(proxy_pool)
response = requests.get('target url', 'proxies={'http':)
proxies={'http': current_proxy},
timeout=10
)
print('Successfully fetching data')
break
except.
print(f"{current_proxy} flopped, switch to the next one")
This set automatically rotates IPs to match ipipgo'spay-per-use packageIt is particularly cost-effective to avoid wasting agent resources.
Ultimate Protection: SSL Certificate Validation
Some sites will check for SSL certificates, which can be handled by adding a parameter to the requests:
response = requests.get('https://目标网站',
proxies=proxies,
verify=False skips SSL verification
)
However, be aware that this trick may reduce security, and it is recommended to use it only in the testing phase. ipipgo's Business Proxy package comes with SSL encrypted transmission, which makes it more secure to use.
Finally, a nagging word: the choice of proxy service providers have to look at the long-term stability. ipipgo I've been using for half a year, their customer service response fast, encounter technical problems can directly find technical small brother remote assistance, than those who can not find the person's pheasant service providers more reliable.

