
When the crawler meets the turtle loading speed, how to save the proxy IP?
Do crawl brothers should have experienced this crazy moments: code running up, the results are stuck in a page that can not finish loading. At this time, if the proxy IP is not powerful, a minute can make people smash the keyboard. Today we do not whole false, directly on the dry goods to say how to use Python + Selenium with proxy IP to play smart waiting.
from selenium import webdriver
from selenium.webdriver.common.proxy import Proxy, ProxyType
ipipgo proxy configuration (remember to change to your own account)
proxy_ip = "123.123.123.123:8888"
proxy = Proxy({
'proxyType': ProxyType.MANUAL,
'httpProxy': proxy_ip,
'sslProxy': proxy_ip
})
options = webdriver.ChromeOptions()
options.add_argument("--proxy-server=http://{}".format(proxy_ip))
A practical guide to the three ways to wait
Don't underestimate these three brothers, use them wrongly and just roll over:
1. the faction waiting for death(time.sleep): simple and rough but easy to overturn, suitable for use with proxy detection. For example, if ipipgo's proxy is in effect, wait for 3 seconds to ensure safety.
2. explicit et cetera(WebDriverWait): It is recommended to use with proxy IP rotation, and cut IP directly after more than 10 seconds.
3. implicit et al.(implicitly_wait): newbies are prone to stepping on potholes, use with caution when the network is unstable!
| Waiting type | Applicable Scenarios | Recommended Duration |
|---|---|---|
| compulsory waiting | Proxy IP Initial Connection | 3-5 seconds |
| explicit wait | Key element loading | In 15 seconds. |
Smart Waiting Black Technology
Ever tried automatically adjusting the wait time when switching proxy IPs? For example, when using ipipgo's Dynamic Residential Proxy, you can play with it that way:
def smart_wait(driver, element_id).
try.
Initially wait 8 seconds
WebDriverWait(driver, 8).until(
EC.presence_of_element_located((By.ID, element_id))
)
except.
Timeout to automatically change ipipgo's IP
driver.proxy = get_new_ipipgo_proxy()
Extend the wait to 15 seconds
WebDriverWait(driver, 15).until(
EC.presence_of_element_located((By.ID, element_id))
)
Common Rollover Scene QA
Q: What should I do if I use a proxy IP to load more slowly?
A: Eighty percent of the IP quality is not good, it is recommended to change ipipgo exclusive proxy. Before a buddy in the e-commerce site to grab data, after changing the ipipgo IP loading speed directly three times faster!
Q: How can I tell if it's a problem with the site's anti-climbing or proxy IP?
A: First off the proxy run once, if normal is the IP problem. Remember to use ipipgo's volume billing IP, after the test and then bulk purchase without waste!
Q: What can I do if the page gets stuck halfway through loading?
A: Try a combination! Explicit wait + proxy IP auto-switching, add a try-except in the code, timeout will change ipipgo's new IP and retry!
Putting a double insurance policy on the code
Finally, I'll teach you a trick to use proxy IP detection and wait strategies as a package:
def safe_get(url).
max_retry = 3
for _ in range(max_retry):
try: driver.get(url).
driver.get(url)
Kernel wait
WebDriverWait(driver, 15).until(
EC.presence_of_element_located((By.TAG_NAME, 'main'))
)
return True
except.
Automatically change ipipgo's IP
rotate_ipipgo_proxy()
raise Exception("Failed to load 3 times in a row, check proxy configuration")
Remember, a good horse with a good saddle and a stable proxy IP is the root of smart waiting. When using ipipgo's proxy service, it is recommended to enable theirAutomated Health CheckWith this feature, the system will automatically kick out unstable IPs, making your wait strategy really work. Don't torture yourself with those free proxies anymore, reliable proxy IPs can improve your wait time setting accuracy by at least 60%!

