
Basic poses for proxy interface calls
engaged in data collection of the old iron know, proxy IP interface with a simple look actually quite a lot of pits. Let's nag the most basic call posture, take ipipgo home services, they give the API documentation is really much more refreshing than others.
As a chestnut, get the simplest request in Python:
import requests
Be careful to replace the key with your own account key here
api_url = "https://api.ipipgo.com/get?key=你的密钥&count=5"
resp = requests.get(api_url)
print(resp.json()) returns the 5 available proxy IPs
But here's one.Tai Hang District, Hong Kong! Many newbies directly take the returned IP to cycle through the IP, only to find that the IP has long been invalidated. The correct posture should be to get a new IP in real time for each request, like this:
def get_fresh_proxy().
return requests.get(api_url).json()['data'][0]
Doorway in the request parameters
Various proxy service providers have a variety of parameter naming, ipipgo's parameter design is more careful. Here is a list of a few essential parameters:
Shelf life(timeout): It is recommended to set 10-15 seconds, too short to easily get the IP, too long may get the invalidated
Protocol type(protocol): http/https/socks5 according to the target website.
Geographical filtering(city_code): used when you need a specific city IP, such as crawling some local websites.
Give an example with a filter condition:
Want https proxy in Shanghai
filter_url = "https://api.ipipgo.com/get?key=密钥&protocol=https&city_code=310000"
Exception Handling Anti-Rollover Guide
I've seen too many cases of crawlers crashing because of proxy problems, here are a few tips to save your life:
1. Double timeout setting: Setting both API request timeouts and business request timeouts
2. IP warm-up mechanism: Visit a test page after getting the IP to verify availability.
3. Dynamic switching strategy: Don't wait for the IP to expire before changing it, it is recommended to change it every 5 requests proactively
The exception handling code in practice looks like this:
try.
proxy = get_fresh_proxy()
resp = requests.get(target_url, proxies=proxy, timeout=(3, 10))
except requests.exceptions.ProxyError: mark_bad_proxy(proxy)
ProxyError: mark_bad_proxy(proxy) Mark IP as invalid.
retry_count -= 1
Real business scenarios
Say a real case of our team: last year to do a certain e-commerce price monitoring, the other side of the anti-climbing strategy to change twice in three days. Later, we used ipipgo'sDynamic Residential AgentsSet to go along with these tawdry maneuvers:
- Randomly switch UserAgent per request
- Important pages are accessed by mobile IP
- Switching to overseas IPs in the early morning hours
- Automatically switch city nodes when encountering CAPTCHA
The resultant survival rate dried right up from 371 TP3T to 891 TP3T, and the program manager was shocked.
Frequently Asked Questions QA
Q: How are concurrent requests handled?
A: It is recommended to obtain IP pools in bulk in advance and randomly select them when using them. ipipgo's enterprise version supports obtaining 500+ IPs in bulk.
Q: What should I do if the returned IP expires immediately?
A:Contact customer service to openLong-term agency packages, or check if the frequency of requests is too high
Q: What if I need a fixed IP?
A: Their static proxy service can bind IPs for up to 24 hours, which is suitable for scenarios that require logging in.
Q: How do I troubleshoot a 403 error?
A: First, visit the website directly without proxy to make sure it is not a problem with the target website. Then use the IP inspection tool provided by ipipgo to verify the proxy status.
Finally, to be honest, choosing the right proxy service provider can save half the effort. The ones like ipipgo are very responsive and respond to work orders in 10 minutes, which is much better than some of the ones that don't respond in half a day. Especially theirIntelligent Routingfeature that automatically matches the optimal nodes, this one is really fragrant.

