
Taking care of LinkedIn's enterprise information collection wildcard with proxy IPs
Recently, many friends doing foreign trade asked, how not to block the number can also bulk pick LinkedIn business information. This matter is frankly four words:Proxy IPs should be tough enoughThe first thing you need to know is how to play this combo. Let's take the example of our own ipipgo service and show you how to play this combo.
Why will I be blocked if I don't use a proxy IP?
LinkedIn's wind control is not vegetarian, the same IP high-frequency request immediately red card off the field. Last year, there is a do lamps and lanterns export buddies do not believe in evil, with their own office network even sweep 200 enterprise homepage, the results of the account directly be permanently banned. Later changed ipipgo dynamic residential IP, with a random request interval, now every day to catch 500 + stable enterprise information did not turn over the car.
import requests
from time import sleep
import random
proxies = {
'http': 'http://用户名:密码@gateway.ipipgo.com:端口',
'https': 'http://用户名:密码@gateway.ipipgo.com:端口'
}
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) Custom UA'
}
Example of scraping logic
def scrape_linkedin(url).
try: response = requests.get(url).
response = requests.get(url, proxies=proxies, headers=headers, timeout=10)
Randomly wait 3-8 seconds
sleep(random.uniform(3,8))
return response.text
except Exception as e.
print(f "Request failed, switching IP automatically: {str(e)}")
Here you can access ipipgo's API to automatically change the IP address.
The Three Fateful Things About Choosing a Proxy IP
There are numerous proxy IP service providers on the market, but the ones that are suitable for LinkedIn capture must be satisfied:
1. Real-life behavioral simulationIPIPGo's residential IPs are real users in real network environments and are more reliable than server room IPs by more than one grade.
2. Switching should be silky smooth: Encounter CAPTCHA can change IP in seconds, this feature we specialize in smart switching APIs
3. Geographic location should be preciseFor example, if you want to catch a German company, the IP must be localized in Germany.
A practical guide to avoiding the pit
Last week a customer with our services also turned over, exhaustion found that the request header is not handled properly. Here are a few easy to step on the mine:
- Don't use the default User-Agent from the requests library, you'll be caught red-handed!
- It's best to bind a fixed cookie to each IP, and don't clear your cache too often!
- The collection time should be in line with the working hours of the target area, and the ghosts will know that it is a robot in the middle of the night.
Frequently Asked Questions QA
Q: Is it okay to use a free proxy?
A: Tested last year, the average survival time of the free agent is less than 15 minutes, the collection of 10 times there are 8 times triggered verification, pure waste of time!
Q: What if I want to collect 100,000 levels of data?
A: It is recommended to use ipipgo's enterprise-level packages to support multi-threaded concurrency + IP auto-rotation, the measured maximum run to 8,000 data per hour
Q: How do I break the CAPTCHA when I encounter it?
A: three steps: 1. immediately stop the current IP request 2. call ipipgo's IP replacement interface 3. change the User-Agent retry
Why ipipgo?
We have optimized it specifically for data collection scenarios:
1. Exclusive IP quality testing system, each IP is tested by real people's behavior before going online.
2. Global 50 million+ residential IP resource pool, support country/city/operator three-level positioning
3. 7 × 24 hours technical support, the last 3:00 a.m. to help customers urgently deal with the problem of ASN was blocked
Say an internal data: with our service customers, LinkedIn collection success rate from 38% directly dry to 91%, sealing rate control in 2% below. This line is spelled out in detail, which link of the IP quality is not passable, the whole chain has to collapse.

