
Getting CAPTCHA cracked? Figure out this windowpane first
Those free CAPTCHA recognition tools on the Internet, to put it bluntly, is image processing + machine learning. It is like teaching a three-year-old child to recognize numbers, you have to show him 100 pictures with numbers. Open source projects such as Tesseract this stuff, dealing with simple digital CAPTCHA okay, encountered distorted deformation of the blind.
Proxy IPs are half the battle in this case.
Hardcore CAPTCHA system with your own IP? Wait for it to be blocked into a sieve!Dynamic Residential Proxy for ipipgoIt allows you to change your "face" every time you request something, like playing Sichuan Opera, so the server can't figure out where you're really coming from. Here's the real-world data:
| Agent Type | recognition success rate | probability of banning |
|---|---|---|
| No Agent | 38% | 72% |
| General Agent | 55% | 41% |
| ipipgo dynamic homes | 82% | 9% |
Hands on to build a CAPTCHA killer
Here's a chestnut in Python, remember to install these libraries first:
pip install requests opencv-python pytesseract
The core code is written this way (remember to change to your own ipipgo proxy account):
import requests
from PIL import Image
proxies = {
'http': 'http://用户名:密码@gateway.ipipgo.com:9021',
'https': 'https://用户名:密码@gateway.ipipgo.com:9021'
}
resp = requests.get('captcha address', proxies=proxies)
with open('captcha.jpg', 'wb') as f.
f.write(resp.content)
Doing grayscale with OpenCV
img = cv2.imread('captcha.jpg', 0)
Pick up the Tesseract recognition code here...
Guide to avoiding pitfalls: five common mistakes made by novices
1. die to an IP:Use ipipgo's auto-switching feature, don't wait to get blocked before switching
2. Skip image preprocessing:No noise reduction, no binarization, direct recognition is blind.
3. Use of free proxy pools:Those public agents were blacked out by the CAPTCHA system a long time ago.
4. Ignoring time-out settings:Suggested to work with ipipgo's 5-second quick-switch feature
5. Rigid Complex CAPTCHA:If you come across Google reCAPTCHA, you should go around it.
Practical case: an e-commerce site automatic login
Recently helped a friend to get the case, using ipipgo's UK residential IP + self-training model, the recognition rate from 23% to 68%. the key point is here:
AutoSwitch after every failure
from ipipgo import AutoSwitchProxy
proxy = AutoSwitchProxy(region='uk')
headers = proxy.add_headers()
When captcha recognition fails
if 'captcha_error' in response.text: proxy.rotate_ip()
proxy.rotate_ip() seconds for a new IP
Five questions you definitely want to ask
Q: Do I have to use a paid proxy?
A: Nine out of ten free agents fail, ipipgo new users have 2G free traffic, enough to test the waters with the
Q: How to choose the agent area?
A: Look at the target web server location, domestic station with the province IP, overseas station is recommended to choose the United States / Germany residential
Q: What makes ipipgo better than the rest?
A: Their IP pool will be "self-healing", automatically remove the blocked IP, to maintain the availability of 95% or more!
Q: What should I do if I encounter a sliding captcha?
A: Simple slider can be simulated with selenium, the complexity of the proposal on the coding platform, do not engage in their own hard
Q: Why does my recognition rate go up and down?
A: Check the IP quality, use ipipgo's API to check the current IP'sShelf liferespond in singingcredit rating
Lastly: CAPTCHA recognition is not a serious way, it is recommended to use in their own system testing. If you really want to commercialize or go through the regular interface, don't get yourself into trouble. ipipgo has technicians who can consult with the compliance program, don't just mess around.

