
Can't get IP rotation right? Hands-on with AWS
The old iron doing data collection should understand that IP being blocked is as common as choking on a meal. Today, we teach you to play with low-cost rotation with AWS elastic IP.Monthly costs can be kept under 20 US dollarsThe savings are at least three times that of buying an off-the-shelf agent.
Why do you have to use AWS?
There's a hidden benefit to this Amazon stuff:Each account can whore out 3.5 hours of EC2 free hours per monthThe first is to use the t2.micro model (configuration enough), with flexible IP dynamic switching. Let's use the t2.micro model (configuration enough), with the elastic IP dynamic switching, the actual test of 50 times a day to change the IP without gasping. The point is - the elastic IP is free of charge when you bind the instance, and only starts billing when you unbind it, so this loophole is the key to saving money.
| manipulate | cost |
|---|---|
| EC2 instances running | $0.0116/hour |
| Elastic IP Idle | $0.005/hour |
| data transmission | First 1GB free |
Four steps to build a rotating service
Step 1: Open an EC2 instance
Choose Ubuntu system, instance type pick t2.micro. security group settings to release port 3128 (for installing Squid proxy later), remember to turn off port 22 extranet access, use the SSH connection that comes with AWS is more secure.
Step 2: Mount the Elastic IP
Request 3 elastic IPs in the console (don't be greedy, AWS checks for abuse). Use this script for automatic switching:
! /bin/bash
OLD_IP=$ (curl -s http://169.254.169.254/latest/meta-data/public-ipv4)
aws ec2 associate-address --instance-id $INSTANCE_ID --public-ip $NEW_IP
aws ec2 release-address --allocation-id $OLD_IP
Step 3: Install Squid Proxy
Hit these three lines of command to get it done:
sudo apt update
sudo apt install squid3
sudo systemctl start squid
The configuration file changes these two places:
1. http_port 3128 to 0.0.0.0:3128
2. add acl localnet src 0.0.0.0/0/0
Step 4: Setting up a timed task
Use crontab to run a toggle script every 30 minutes so that the IP address is automatically refreshed. It is recommended to set the time interval to be randomized so that the target site does not find a pattern.
Three Tips for Maintenance
1. Everyday CloudWatch: the main focus on the network out of the flow, a sudden surge of eighty percent is the crawler back grips the
2. Weekly clearing of logs: /var/log/squid/access.log this file eats the hard disk
3. Prepare an automatic telescopic group: In case the IP is blocked too hard, automatically open a new instance to top it up
QA time
Q: Why do you recommend ipipgo?
A: their residential agent can directly dock the AWS API, to change the IP directly when the interface on the line, than self-built IP pool to save trouble. Especially friends who do cross-border e-commerce with ipipgoStatic Residential IPHanging a store account is much more stable than a server room IP.
Q:弹性IP切换有咋办?
A: It is tested that using AWS SDK is 3 times faster than web operation, remember to add a 5 seconds delay in the script. If it doesn't work, go directly to ipipgo's tech support, they can help to adjust the API docking.
Q: How can I quickly recover my blocked IP?
A: Don't be hard-headed! Immediately deactivate that IP segment and go to the ipipgo backend to claim a new IP pool. There's a cold trick in their house:Doing rotation with mobile base station IPThis type of IP segment has a much lower recognition rate.
This program our team ran for half a year, with ipipgo API to do the second verification, collection success rate from 63% soared to 91%. the key is the cost is controllable, encountered a large-scale blocking of the IP directly cut to their standby pool, more cost-effective than the death of the self-built IP.

