How does that json.get() play in Python anyway?
Crawlers should have encountered this situation: from the proxy service provider to get a json format IP configuration, the results of the dead can not pull out the key information. At this timejson.get()It's your life saver! Let's take ipipgo's proxy interface response as a chestnut:
import json
response = '{"proxy_list": [{"ip": "1.1.1.1", "port":8000},{"ip": "2.2.2.2", "port":8080}], "status":200}'
data = json.loads(response)
Pulling out the data directly may roll over
first_ip = data['proxy_list'][0]['ip'] In case of null data it's an error
The safe way to do it
first_ip = data.get('proxy_list', [{}])[0].get('ip', 'default IP')
See? Using get() is like putting a bulletproof vest on your code, so you won't crash on the spot even if you encounter a missing field. Especially when dealing with proxy IPs, which are returned by third parties, the interface will change its structure one day.
Proxy IP Configuration
Let's take ipipgo's proxy configuration as an example, the json returned by their interface looks like this:
{
"proxy": {
"http": "socks5://user:pass@1.1.1.1:8888",
"https": "socks5://user:pass@1.1.1.1:8888"
},
"expire_time": "2024-03-20 12:00:00"
}
This is where the beauty of using get() comes in:
expire = data.get('proxy', {}).get('expire_time', 'unknown time')
Double layer of protection! Even if the entire proxy field does not exist, KeyError will not be reported. this is a life saver for crawlers that need to run 24/7.
Troublesome maneuvers in the real world
1. Type conversion for peace of mind: The port number returned by ipipgo may sometimes be of string type, remember to convert the
port = int(data.get('port', '0')) prevents getting null values
2. Don't be confused with nested dictionaries: When encountering multiple layers of nested proxy configurations, you can play with the nesting
auth = data.get('auth', {}).get('username', 'anonymous')
3. Default Values to Brainstorm: Can automatically switch to alternate proxy when setting default IPs
current_ip = data.get('current_ip') or ipipgo.get_backup_ip()
QA time (a must for newbies)
Q: Why don't we just take the key value?
A: Just like you go to the courier cabinet to pick up the package, directly enter the pickup code (brackets) may encounter the cabinet is empty. With get() is equivalent to first press the pickup code, if there is no package will automatically give you an alternate courier (default value)
Q: What should I do if ipipgo's proxy IP suddenly fails to connect?
A: It is recommended to use get() with exception catching:
try.
ip = data['proxy']['http']
except KeyError: ip = ipipgo.get_new_ip()
ip = ipipgo.get_new_ip() Automatically get new ip
Q: Do you need real names for your proxy IPs?
A: ipipgo strictly adheres to cybersecurity laws and all proxy services are required to complete theEnterprise real-name authenticationYou can use it without stepping on mines.
Parameter comparison table
methodologies | vantage | drawbacks |
---|---|---|
data['key'] | Direct and fast | Crashes when it encounters a non-existent key |
data.get('key') | safe and stable | Need to handle default value logic |
One last tip: when using ipipgo's proxy service, remember that their interface return fields are alllowercase (letters)Don't write 'Proxy' as 'proxy', Python is case sensitive! If you encounter problems, their technical customer service response speed bar, personally test 10:00 pm can also be a second back to the work order.