IPIPGO ip proxy Java Proxy IP HTML Parser: Java Proxy IP Parser Library

Java Proxy IP HTML Parser: Java Proxy IP Parser Library

First, why use Java to engage in proxy IP resolution? Do the old iron of the network crawler understand, directly with their own IP crazy request site, minutes to be blacklisted. At this time it is necessary to use proxy IP to hide the real identity, as if the crawler to wear a myriad of masks. But the proxy IP service on the market are returned to the H...

Java Proxy IP HTML Parser: Java Proxy IP Parser Library

First, why use Java to engage in proxy IP resolution?

Do the old iron of the network crawler understand, directly with their own IP frantically requesting the site, minutes to be blacklisted. This time you have to use a proxy IP toHide your true identity.It's like putting a million masks on a crawler. But the market proxy IP services are returned to the HTML format, can not manually copy and paste it? This time you need to write a parser to batch processing.

Second, hand building wheel tutorial

Let's use Jsoup as an HTML parser, with ipipgo's proxy service to practice. Suppose we want to extract the IP address and port number from the page obtained by ipipgo, the page structure looks like this:


<div class="proxy-list">
  <span>101.202.3.4</span>
  <em&gt|</em>
  <span>8080</span>
</div>

The code is written this way (note the exception handling section):


// Setting up ipipgo's proxy (emphasis added!)
System.setProperty("http.proxyHost", "gateway.ipipgo.com");
System.setProperty("http.proxyPort", "9021");;

Document doc = Jsoup.connect("https://api.ipipgo.com/proxies")
               .timeout(10000)
               .timeout(10000); .get();

Elements proxies = doc.select("div.proxy-list");
for (Element proxy : proxies) {
    String ip = proxy.select("span:first-child").text();
    String port = proxy.select("span:last-child").text(); String port = proxy.select("span:last-child").text();
    System.out.println("Caught valid IP:" + ip + ":" + port);
}

III. A guide to avoiding the three giant pits

Pit 1: IP failure is not handled - Suggested by ipipgoSurvival rate 99%packages, their IPs are automatically refreshed every 15 minutes

Pit 2: Requests are banned too often - Add a random wait time to the code:


Thread.sleep((long)(Math.random() 3000));

Pit 3: HTTPS certificate issues - Add this configuration to the initialization:


Connection connection = Jsoup.connect(url)
    .sslSocketFactory(ipipgoSSLContext().getSocketFactory());

IV. QA Frequently Asked Questions

concern prescription
What should I do if I always time out when parsing? Set ipipgo's response timeout parameter to 15000ms, the average response from their API is only 800ms
What if I need a highly anonymous agent? Go with ipipgo.Enterprise PackageThe X-Forwarded-For request header will automatically have the X-Forwarded-For

V. Performance Optimization Tips

1. Reduce repeated handshakes with connection pooling:


Connection.Response res = Jsoup.newSession()
    .url(url)
    .proxy("gateway.ipipgo.com", 9021)
    .execute();

2. with ipipgoexclusive IP poolThe actual parsing speed is more than 3 times faster.

3. Remember to regularly clean up invalid IPs, you can use the API status detection interface they provide

VI. Speak the truth

The most troublesome part of writing your own parser isn't the code, it's maintaining the quality of the proxy IPs. I've used a couple of free services before and 8 out of 10 IPs were dead. Then I switched to ipipgo.Dynamic Residential IPThe resolution success rate has increased directly from 50% to 95%, which is a relief to say the least, no need to toss the retry mechanism all day long.

我们的产品仅支持在境外网络环境下使用(除TikTok专线外),用户使用IPIPGO从事的任何行为均不代表IPIPGO的意志和观点,IPIPGO不承担任何法律责任。

business scenario

Discover more professional services solutions

💡 Click on the button for more details on specialized services

美国长效动态住宅ip资源上新!

Professional foreign proxy ip service provider-IPIPGO

Contact Us

Contact Us

13260757327

Online Inquiry. QQ chat

E-mail: hai.liu@xiaoxitech.com

Working hours: Monday to Friday, 9:30-18:30, holidays off
Follow WeChat
Follow us on WeChat

Follow us on WeChat

Back to top
en_USEnglish