
Hands-on teaching you to use free API to catch web pages, do not let the IP is blocked into the wrong kind of
The old iron engaged in data crawling should understand that the most headache is the target site suddenly give you aIP blockingThe first thing you need to do is to use a proxy IP. If you use a proxy IP at this time, it is like playing a game open resurrection armor, a minute full blood resurrection in place. Today, we will give you a break how to use free API with proxy IP to engage in web crawling.
Why do I have to use a proxy IP?
For example, you take your own home broadband IP to go crazy to catch the price of an e-commerce site, not out of half an hour quasi-blocked. At this time, if you use the proxy IP pool to rotate the access, the website side to see each time isnew faceIt's like playing hide-and-seek. It's like playing hide-and-seek. Every time you change your clothes and go out, the one who catches you can never catch you.
How to choose a free API that's reliable
Free APIs are a dime a dozen on the market, but there are a lot of potholes. Focus on these three points:
1. (med.) recovery rate: Don't use dead IP pools that don't update half the time!
2. anonymity: Highly anonymous proxies to hide real IPs
3. frequency limit: At least a couple hundred requests a day.
Here's a good one.ipipgoof free packages for newcomers who sign up500 per dayThe HTTP proxy quota. Their IP pool is automatically refreshed every hour, the measured survival rate can be more than eighty percent, the key is not to tie the credit card and so on.
Code Practice Walking Wave
Demonstrate the simplest example with Python's requests library:
import requests
proxies = {
'http': 'http://username:password@gateway.ipipgo.com:9020',
'https': 'http://username:password@gateway.ipipgo.com:9020'
}
response = requests.get('destination URL', proxies=proxies, timeout=10)
print(response.text)
Be careful to replace the username and password with your own in theipipgoThe backend gets the authentication information. If it is too much trouble to manually change the proxy every time, you can write a middleware that automatically switches, and we will not expand on that here.
Guidelines on demining of common problems
Q: Will the free agent leak data?
A: Pick one like ipipgo that has HTTPS encrypted channel, much safer than wildcard proxies. If you are really worried about sensitive data, it is recommended to go on their paid version of exclusive IP.
Q: What should I do if I always encounter CAPTCHA?
A: This means that the site has suspected that you are a robot. Two ways: 1. lower crawl frequency 2. change higher anonymous proxy package. ipipgo's business package with automatic CAPTCHA crack function, suitable for professional players.
Q: What happens when the free credit runs out?
A: Either sign up for more than one small number (be careful not to violate the terms of service) or just upgrade to a paid package. ipipgo's monthly packages are as low as 30 bucks, which is cheaper than buying milk tea.
Tips for avoiding the pit
1. Don't use public proxy pools, those IPs have long been blacklisted by major websites.
2. Randomly set the User-Agent for each request to act like a browser.
3. Important data remember to do local cache, to prevent repeated crawling waste quota
4. Don't fight the continuous failure, immediately change the IP address and continue to work.
Finally said a heartfelt, free API play can, really want to engage in serious projects or have to rely on the agent service. LikeipipgoThis kind of offer free trial + step pricing is quite friendly to both new and old drivers. If you don't understand anything, just poke their official website for customer service, and the reply is faster than a delivery boy delivering food.

