
Hands on operation with wget disguised as a real person
Those of you who are involved in web data collection know that many websites will pass theUser-AgentIdentify the crawler program. Today we will use the most straightforward language, say how to wget command line tool wear "vest", with ipipgo proxy IP service, perfect to avoid website detection.
wget --user-agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36" https:/ /targets.com
The above command accomplishesuser agent masquerading asI'm not sure how to do this, but I'm going to try to disguise wget as Chrome. But this is not enough, if you use the same IP access for a long time, the site will still be blocked. This time we have to offer our killer - ipipgo dynamic proxy IP.
Proxy IP's real-world combo
Recommended for ipipgoDynamic Residential AgentsThis kind of IP is exactly the same as the IP of the real user surfing the Internet, which is extremely stealthy. See here for details on how to configure it:
wget -e use_proxy=yes -e http_proxy=123.123.123.123:8888 --user-agent="Spoof UA" Target URL
Just change the IP address in there to the proxy IP provided by ipipgo. You can also set it up in the backend of their houseAutomatic IP change cycleIt is recommended to set the settings to change every 5-10 minutes so that the website can't feel the pattern at all.
Anti-detection configuration package table
| configuration item | recommended value |
|---|---|
| User-Agent | Chrome latest version UA |
| request interval | 30-60 seconds random |
| IP replacement frequency | 5 minutes/times |
| Agent Type | Residential Agents |
Remember to turn it on in the ipipgo backendIP Rotation ModelThis function can automatically switch between different regions of the IP, just like the martial arts novels "shape shifting", so that the site can not be defended.
Guidelines on demining of common problems
Q: What should I do if my proxy IP suddenly fails?
A: ipipgo's IP pool is updated with 200,000+ IPs every day, and will automatically switch when it encounters a failure. It is recommended to add--retry-connrefusedparameters are automatically retried.
Q: How do I verify if the agent is in effect?
A: Test with this command first:wget -q -O - checkip.ipipgo.com, you can see the currently used exit IP.
Q: What if the site is still blocking requests?
A: Three checking directions: 1. whether the UA is too fake 2. whether the request frequency is too high 3. whether the proxy IP is marked. It is recommended to turn on the ipipgo consoleIP Health DetectionFunction.
Upgraded Configuration Tips
In the configuration file~/.wgetrcAdd these settings to the RI, once and for all:
user_agent = Mozilla/5.0 (Windows NT 10.0; rv:91.0) Gecko/20100101 Firefox/91.0
use_proxy = on
http_proxy = ipipgo proxy address:port
retry_connrefused = on
random_wait = on
Lastly, I'd like to remind you that you should never go cheap when choosing a proxy service. ipipgo'sHigh Stash AgentsX-Forwarded-For information will be completely erased in the request header, which is the real "stealth". If you encounter a website that requires login, remember to use it together with a cookie, the success rate can be increased by more than 70%.

