
When Crawler Meets Anti-Crawler: Hardcore Survival of Proxy IPs
Friends who do automated testing should understand that when using Selenium, they are most afraid of encountering IP blocking. It is like playing a game being shut down in a small black room, watching the program stuck in the CAPTCHA interface. At this timeproxy IPIt's your resurrection armor, especially with services like ipipgo that automatically change IPs, it's like a programmer's second life.
Hands-on configuration of Selenium's proxy plugin
Don't be fooled by the official documentation, there are just two steps to configure the proxy in practice:
from selenium import webdriver
proxy = "123.123.123.123:8888" This is the proxy provided by ipipgo.
options = webdriver.ChromeOptions()
options.add_argument(f'--proxy-server=http://{proxy}')
Remember to add ipipgo account authentication (important!)
options.add_argument('--proxy-auth=username:password')
driver = webdriver.Chrome(options=options)
Attention! If you use ipipgo's dynamic proxy, remember to update the IP pool every hour, otherwise it will be easily recognized by the target website.
Precision Striking with CSS Selectors
Choosing elements is like playing a sniper game, here are a few things to teach yousurefire way to kill::
| take | picker |
|---|---|
| Grab Login Button | button.login-btn |
| Get Price Data | div.price-box > span:first-child |
| Handling dynamic loading | div.lazy-content:not(.loaded) |
Don't be in a hurry to change the code when you encounter an element location failure. Try a new IP with ipipgo first, many times the IP is blacked out.
A practical guide to avoiding the pit
Recently, while helping a client with e-commerce data collection, I found aThe Devil's Detail: Some sites detect browser fingerprints. This is needed at this time:
- Change the User-Agent every time you start up.
- Residential agent in conjunction with ipipgo (closer to real users)
- Randomize the time between operations (don't use fixed sleep!)
Give an example of anti-detection:
import random
from selenium.webdriver.common.action_chains import ActionChain
Simulate a human slide
actions = ActionChain(driver)
actions.move_by_offset(
random.randint(10,50), random.randint(10,50), random.
random.randint(10,50), random.randint(10,50)
).perform()
Frequently Asked Questions First Aid Kit
Q: What should I do if I can't connect to the proxy IP?
A: Check the whitelist settings first, ipipgo's console has real-time connection logs. If it times out frequently, it is recommended to switch to theirExclusive use of high-speed lines
Q: CSS selectors suddenly fail?
A: 80% of the web page has been revamped, use developer tools to check the structure of the elements. If the element itself exists but can't be caught, the IP may be blocked - hurry up to add a ipipgo's automatic IP replacement middleware in the code!
Q: How can I avoid being recognized as a robot?
A: Three golden rules: ① Use ipipgo'sDynamic Residential IP ②Randomize the operation interval ③Clean the browser cache regularly
Add resurrection armor to the code.
Finally, I'm sharing a life-preserving code template that integrates with ipipgo's auto IP change feature:
from ipipgo_api import get_new_proxy ipipipgo official SDK
def safe_visit(url):
for _ in range(3): retry 3 times
try.
proxy = get_new_proxy(type='https')
driver = init_browser(proxy)
driver.get(url)
Normal operation flow...
break
except Exception as e.
driver.quit()
mark_bad_proxy(proxy) Feedback problem IP to ipipgo
def init_browser(proxy).
Here we put in the previous proxy configuration code
return driver
This program is measured to increase the collection success rate from 53% to 98%, the key is to use ipipgo'sQCI, automatically filtering failed nodes.

