
The old iron who engaged in Python crawler look over! Proxy IP to prevent blocking!
Recently a lot of data collection brothers are asking, why their own crawler running on the run was blocked? This is a matter of fact, just like playing the game hanging a reason--The same IP crazy request, people's websites do not block you block who?This time you need to proxy IP as a substitute, today we take the Python requests library as a chestnut.
import requests
proxies = {
'http': 'http://username:password@gateway.ipipgo.com:9020',
'https': 'http://username:password@gateway.ipipgo.com:9020'
}
response = requests.get('http://目标网站.com', proxies=proxies)
print(response.text)
Notice a key point here:username and password should be changed to the account you registered with ipipgo.Their proxy server address is gateway.ipipgo.com and the port number will change depending on the package. It is recommended to go directly to the official website to find the latest configuration, don't fool yourself.
Three types of stealth for proxy IPs
Many white people do not know that the agent is also divided into levels, here is a simple science:
Transparent Agent(The site can see your real IP)→ General Agent(hides the IP but exposes the proxy identity) ¡ú High Stash Agents(Full stealth). For crawlers, you have to use a high stash, we recommend ipipgo!Diamond Package, the measured anti-climb detection rate can drop 70%.
Five guidelines for avoiding pitfalls in the real world
1. Don't be lazy with timeout settings: There is no timeout limit for requests by default, so if you encounter a laggy proxy, your program can get stuck until the end of time.
response = requests.get(url, proxies=proxies, timeout=10)
2. IP rotation should be randomized: Don't be stupid and use a fixed IP, ipipgo's API can dynamically get a pool of IPs so that each request uses a new IP!
3. Exception handling can't be understated: Retry automatically if the connection fails, but don't dead-end it.
try.
response = requests.get(url, proxies=proxies)
except requests.exceptions.ProxyError: print("The proxy is jerking around, try another IP").
ProxyError: print("Proxy is jerking around, try another IP")
Real Case: E-commerce Price Monitoring
Last year, I helped a friend do a certain e-commerce price comparison system with ipipgo's Business Edition package.500 IP polls every 5 minutes, consistently ran for 3 months without being banned. Here's a tip:Different product pages are accessed with different regional IPs, which makes it look more like a real user.
Frequently Asked Questions QA
Q: What should I do if my proxy IP suddenly fails?
A: First check your account balance, then use the online testing tool provided by ipipgo to measure IP availability. It is recommended that you randomly select an IP from the IP pool before each request.
Q: How can I tell if an agent is a high stash?
A: Visit http://httpbin.org/ip, if the IP returned is not the same as your real IP and there is no X-Forwarded-For header, it's true high stash
Q: How do I assign IPs to multiple crawlers on at the same time?
A: Use ipipgo'sMulti-threaded dedicated channelThe IP address of each thread is independent of the IP address of each thread to avoid resource conflicts.
Say something from the heart.
At the beginning of the use of proxies also stepped on the pit, the worst time because of the use of free proxies led to the server was hacked. Later, I switched to using ipipgo's professional services and realized thatStable proxy IPs can really save a lot of fiddling around.. Especially their smart routing feature that automatically selects the fastest node this does smell good.
A final reminder for newbies:Don't write account passwords explicitly in code!It is recommended to use environment variables or configuration files, safety first. If there is still do not understand, go directly to the official website of ipipgo technical customer service, the reply speed than some big factories much faster.

