
Teach you how to use pip to install proxy IP resolution tool.
Recently, a lot of data collection friends asked Lao Zhang, why they wrote the crawler is always blocked IP, this thing is not really complicated, the key to the program to wear a "protective clothing". Today we will nag how to use pip to install those can automatically resolve the proxy IP library, by the way, a reliable proxy service provider.
What do I need to prepare before I load the warehouse?
First you have to make sure you have thePython 3.6 or abovePress and hold down win+R and type cmd to enter, and type in the black window.python --versionYou will be able to see the version. If the version is too old, we recommend going directly to the official website to get a new version.
To install the requests library as an example
pip install requests -i https://pypi.tuna.tsinghua.edu.cn/simple
Note that the Tsinghua Mirror Source is used here, so the download speed can be much faster. If you get a message that the pip version is old, runpython -m pip install --upgrade pipUpgrade down.
Real-world proxy IP resolution triple axe
Here are three libraries that have been tested and worked well, let's focus on the first one:
| library name | specificities | Applicable Scenarios |
|---|---|---|
| requests-html | Self-parsing | Simple Web Page Capture |
| scrapy | Professional level framework | Large-scale projects |
| pyquery | jQuery syntax | Complex page parsing |
Actual code snippet (remember to replace with your own proxy)
from requests_html import HTMLSession
proxies = {
'http': 'http://user:password@ipipgo-proxy.com:9020',
'https': 'http://user:password@ipipgo-proxy.com:9020'
}
session = HTMLSession()
response = session.get('https://目标网站', proxies=proxies)
print(response.html.find('title'))
Focus on this.Proxies parametersThe tunneling proxy format provided by ipipgo is used here. Their proxies don't have to switch IPs manually, which is especially newbie friendly.
Guidelines for demining common pitfalls
Q: What should I do if I keep getting errors when loading the library?
A: First check the network has no open proxy, sometimes open the global proxy instead of connecting to the pip source. It is recommended to turn off the proxy software temporarily and try again.
Q: Code runs through but can't get data?
A: 80% of the proxy IP is recognized by the target website. This time to changeHigh-quality agents, such as ipipgo's exclusive IP packages, where each IP is a real residential IP that has been used by a real person.
Q: How can I tell if a proxy is in effect?
A: Add a test URL to the code:session.get('http://httpbin.org/ip')to see if the returned IP is a proxy IP.
Why do you recommend ipipgo?
It's not for nothing that I ended up locking ipipgo after using the proxy service for over three years:
- Domestic self-built server room, latency can be controlled within 50ms
- Support pay-as-you-go, newcomers get a free 1G traffic trial
- Exclusive offerFailure Retry MechanismAutomatic IP switching
Special mention of theirIntelligent Routing Function, which can automatically match the proxy node where the target website is located. For example, if you want to collect Japanese websites, the system will automatically assign the export IP of Tokyo server room.
Upgrade Play Tips
If you are doing a long term collection project, it is recommended to write the proxy configuration as a separate configuration file:
config.py
PROXY_CONFIG = {
'proxy_host': 'ipipgo-proxy.com',
'proxy_port': 9020,
'username': 'Your account number',
'password': 'your password'
}
Then refer to this configuration in the main program, so that it is convenient to change the proxy service provider in the future. By the way, ipipgo background can view the API calls in real time, which is especially helpful for troubleshooting.
Lastly, I would like to remind newbies not to use free agents for cheap. Before a customer greedy cheap, the results of the collection of all the fake data, and finally have to rework to redo. Professional things or to ipipgo such professional service providers reliable, save time to take two more projects what are back.

