Hands-on teaching you to use pip to load BeautifulSoup
Brothers engaged in crawling know that the installation of BeautifulSoup is as basic as eating with chopsticks. But recently, some people complained to us that installing a library is always wrong, either the download is stuck like a dog, or the installation fails somehow. Today we will nag about it, and teach you how to use it!Proxy ip for ipipgoto take care of these moths.
Let's see if we have the right pip version first
python -m pip install --upgrade pip
Basic installation commands (when the network is good)
pip install beautifulsoup4
Why do you need a proxy ip to load the library?
Some companies have strict intranet control, or their own network is pumped up, directly installed third-party libraries often timeout. this time you have to sacrifice the proxy method, with theQuality agents for ipipgoIf you change the exit ip, the installation success rate is directly doubled. The actual test with their dynamic residential agent, download speed can raise 40% more than.
Installation steps for live agents
Take Windows as an example (same for Mac/Linux):
Installation command template with proxy
pip install beautifulsoup4 --proxy http://用户名:密码@gateway address:port
An example for ipipgo (remember to replace your account)
pip install bs4 --proxy http://vipuser-123456@gateway.ipipgo.net:9020
Common Errors | prescription |
---|---|
SSLError | Precede the proxy address withhttp://Don't use https |
Timeout | Change ipipgo'sLong Connection Package |
Crawler anti-blocking techniques
After installing BeautifulSoup don't rush to open crawl, with proxy ip to do the request is the king. Here to give an example of a combination:
import requests
from bs4 import BeautifulSoup
proxies = {
'http': 'http://gateway.ipipgo.net:9020',
'https': 'http://gateway.ipipgo.net:9020'
}
resp = requests.get('destination URL', proxies=proxies)
soup = BeautifulSoup(resp.text, 'html.parser')
With ipipgo.Dynamic rotation of agentsThe first is to change the ip automatically for each request, and the pro-test has not been blocked for 3 consecutive days of catching.
QA First Aid Kit
Q: What should I do if I get an error after installing import?
A: 80% of the library name is not written in full, have to use thefrom bs4 import BeautifulSoup
Note the upper and lower case
Q: Proxy settings are successful but I can't connect?
A: First check the ipipgo backend of theWhitelistingIf the local IP is not bound, whether the package is valid or not.
Q: How can I check the installed version?
A: Command line knockpip show beautifulsoup4
You can see the version number and installation path
Guide to avoiding the pit
1. Don't usepip install BeautifulSoup
This old fart has long since been renamed beautifulsoup4
2. If the company network has a firewall, remember to open it in the ipipgo background.Enterprise-grade encrypted channel
3. Recommended for batch crawling withlxml
Parser, faster:
pip install lxml --proxy http://gateway.ipipgo.net:9020
Lastly, for those of you who use ipipgo, remember to set theException Retry MechanismThe company also has a temporary IP address, which can be switched automatically even if the temporary IP address fails. Their background can check the real-time dosage, traffic warning function to do a thief thoughtful, no longer afraid of running in the middle of the night collapse task.