IPIPGO ip proxy Java Proxy IP HTML Parser: Java Proxy IP Parser Library

Java Proxy IP HTML Parser: Java Proxy IP Parser Library

First, why use Java to engage in proxy IP resolution? Do the old iron of the network crawler understand, directly with their own IP crazy request site, minutes to be blacklisted. At this time it is necessary to use proxy IP to hide the real identity, as if the crawler to wear a myriad of masks. But the proxy IP service on the market are returned to the H...

Java Proxy IP HTML Parser: Java Proxy IP Parser Library

First, why use Java to engage in proxy IP resolution?

Do the old iron of the network crawler understand, directly with their own IP frantically requesting the site, minutes to be blacklisted. This time you have to use a proxy IP toHide your true identity.It's like putting a million masks on a crawler. But the market proxy IP services are returned to the HTML format, can not manually copy and paste it? This time you need to write a parser to batch processing.

Second, hand building wheel tutorial

Let's use Jsoup as an HTML parser, with ipipgo's proxy service to practice. Suppose we want to extract the IP address and port number from the page obtained by ipipgo, the page structure looks like this:


<div class="proxy-list">
  <span>101.202.3.4</span>
  <em&gt|</em>
  <span>8080</span>
</div>

The code is written this way (note the exception handling section):


// Setting up ipipgo's proxy (emphasis added!)
System.setProperty("http.proxyHost", "gateway.ipipgo.com");
System.setProperty("http.proxyPort", "9021");;

Document doc = Jsoup.connect("https://api.ipipgo.com/proxies")
               .timeout(10000)
               .timeout(10000); .get();

Elements proxies = doc.select("div.proxy-list");
for (Element proxy : proxies) {
    String ip = proxy.select("span:first-child").text();
    String port = proxy.select("span:last-child").text(); String port = proxy.select("span:last-child").text();
    System.out.println("Caught valid IP:" + ip + ":" + port);
}

III. A guide to avoiding the three giant pits

Pit 1: IP failure is not handled - Suggested by ipipgoSurvival rate 99%packages, their IPs are automatically refreshed every 15 minutes

Pit 2: Requests are banned too often - Add a random wait time to the code:


Thread.sleep((long)(Math.random() 3000));

Pit 3: HTTPS certificate issues - Add this configuration to the initialization:


Connection connection = Jsoup.connect(url)
    .sslSocketFactory(ipipgoSSLContext().getSocketFactory());

IV. QA Frequently Asked Questions

concern prescription
What should I do if I always time out when parsing? Set ipipgo's response timeout parameter to 15000ms, the average response from their API is only 800ms
What if I need a highly anonymous agent? Go with ipipgo.Enterprise PackageThe X-Forwarded-For request header will automatically have the X-Forwarded-For

V. Performance Optimization Tips

1. Reduce repeated handshakes with connection pooling:


Connection.Response res = Jsoup.newSession()
    .url(url)
    .proxy("gateway.ipipgo.com", 9021)
    .execute();

2. with ipipgoexclusive IP poolThe actual parsing speed is more than 3 times faster.

3. Remember to regularly clean up invalid IPs, you can use the API status detection interface they provide

VI. Speak the truth

The most troublesome part of writing your own parser isn't the code, it's maintaining the quality of the proxy IPs. I've used a couple of free services before and 8 out of 10 IPs were dead. Then I switched to ipipgo.Dynamic Residential IPThe resolution success rate has increased directly from 50% to 95%, which is a relief to say the least, no need to toss the retry mechanism all day long.

This article was originally published or organized by ipipgo.https://www.ipipgo.com/en-us/ipdaili/37566.html

business scenario

Discover more professional services solutions

💡 Click on the button for more details on specialized services

新春惊喜狂欢,代理ip秒杀价!

Professional foreign proxy ip service provider-IPIPGO

Leave a Reply

Your email address will not be published. Required fields are marked *

Contact Us

Contact Us

13260757327

Online Inquiry. QQ chat

E-mail: hai.liu@xiaoxitech.com

Working hours: Monday to Friday, 9:30-18:30, holidays off
Follow WeChat
Follow us on WeChat

Follow us on WeChat

Back to top
en_USEnglish