
Why is LinkedIn's job data crawl always blocked?
Recently, many of my friends who are doing recruitment analytics have been complaining that LinkedIn job data is getting harder and harder to catch. You may have tried to reduce the frequency of requests, change the User-Agent, but found that there is no way to get the job data.treat the symptoms but not the root cause. The core of the problem is - the platform's anti-crawling mechanism has been able to accurately identify abnormal behavior of the same IP.
Take a real case: a headhunting company with their own office of the fixed IP to catch data, the first three days every hour to catch 200 are normal, the fourth day was suddenly completely blocked. What's more troublesome is that this IP was blocked and affected the company's normal recruitment account login."One Loss, Two Losses."Situation.
The right way to open a proxy IP
The key to solving this problem lies inMake each request look like a different person is operating. Here's a tested and effective configuration plan to share:
import requests
from itertools import cycle
proxies = [
"http://user:pass@gateway.ipipgo.com:30002".
It is recommended to have at least 50 IPs in rotation
]
proxy_pool = cycle(proxies)
for page in range(1, 10): current_proxy = next(proxy_pool)
current_proxy = next(proxy_pool)
current_proxy = next(proxy_pool)
response = requests.get(
url="https://www.linkedin.com/jobs/search/",
url="", proxies={"http": current_proxy},
headers={"User-Agent": "UA generated by random UA generator"}, timeout=10
timeout=10
)
Processing data logic...
except Exception as e.
print(f "Error using proxy {current_proxy}: {str(e)}")
Here are the highlightsUnique configuration of ipipgoTheir dynamic residential proxies come with browser fingerprinting emulation, where each IP is associated with real device information, making them harder to identify than ordinary proxies. In particular, theirIntelligent Session Holding TechnologyThe ability to maintain login status when switching IPs is especially important for post detail pages that require login to view.
Anti-Blocking Strategy Checklist
When used in conjunction with a proxy IP, these details make the difference:
| risk point | prescription |
|---|---|
| Fixed frequency of requests | Random delay (0.5-3 seconds) + different strategies for weekdays/weekends |
| Header features are single | 11 randomly generated browser fingerprints per request |
| IP Association Behavior | Request up to 20 immediate replacements per IP |
| CAPTCHA interception | AI CAPTCHA auto-recognition module with ipipgo |
Special Note: Many people use proxies in a way that overlooks theDNS leakage issues. It is recommended to include detection logic in the code, or just go with the ipipgo suppliedFull Tunnel Encryption Proxy, avoiding these kinds of low-level mistakes from the bottom up.
Common pitfalls QA
Q: Obviously used proxy IP or still blocked?
A: Check three places: 1. Whether each request really switches the exit IP 2. Whether the local time is synchronized with the time zone of the proxy server 3. Whether there is a cookie leakage issue
Q: Does ipipgo's IP pool need to be maintained by myself?
A: No need, their background will automatically exclude the tagged IPs.Dynamic Cleaning SystemsA new batch of IPs is updated every 15 minutes, which is much more efficient than manual maintenance.
Q: What level of crawl speed can I get?
A: With 50 IP rotation, the steady state can grab 800-1200 complete job data (including company information, salary range) per hour. If it is a rush order project, you can turn on ipipgo'sRush Mode, but be careful to match the request frequency control.
Mind-saving programs for techies
If you don't want to write your own code, you can just use the ipipgo suppliedLinkedIn Data Acquisition Suite. Their pre-configured program contains:
- Automated post keywordsSubscription
- Intelligent exclusion of duplicate posts function
- Multi-format export (Excel/API/database direct)
- Automatic fusing mechanism for abnormal traffic
They recently went live withEnterprise Customized ServiceIt supports the training of exclusive anti-anti-crawling models based on industry characteristics. Especially for such fields as finance and IT, which have a special job description format, the data parsing accuracy can be improved by more than 40%.

