IPIPGO ip proxy Ruby web crawler: Ruby proxy crawler development practice

Ruby web crawler: Ruby proxy crawler development practice

Ruby crawler encounter anti-climbing how to do? Try the proxy IP this trick brothers engaged in crawling know, the site blocked IP that is called a ruthless. Last week I wrote a script to catch the price of e-commerce, just started to run happily, the results of the next day on the rest - the target site directly pull my IP black. At this time it is necessary to pull out the proxy IP this...

Ruby web crawler: Ruby proxy crawler development practice

Ruby crawler encounter counter-crawl how to do? Try this proxy IP trick

Crawler brothers understand that the site blocked IP that is called a ruthless. Last week I wrote a script to catch the price of e-commerce, just started to run happy, the results of the next day on the rest - the target site directly to my IP black. This time we have to pull out the proxy IP this magic weapon, today we take Ruby to say how to play around with the proxy crawler.

How exactly do you load a proxy IP into Ruby?

Using proxies in Ruby is ridiculously easy, depending on what library you're using. For example, with HTTParty, configuring a proxy is a matter of three lines of code:


require 'httparty'

response = HTTParty.get('https://目标网站.com',
  http_proxyaddr: 'Proxy IP assigned by ipipgo',
  http_proxyport: port number, http_proxyuser: 'ipipgo assigned proxy IP', http_proxyport: port number, http_proxyuser: port number, http_proxyuser: port number
  http_proxyuser: 'Account number',
  httpproxypass: 'password'
)

Caution! Here's a pitfall, many newbies will forget to set the timeout. It is recommended to addtimeout: 30This parameter, otherwise the program stuck you do not know what happened.

How to choose between dynamic and static proxies? Depends on the scenario

There are three packages available at ipipgo home, and which one you choose depends on your business needs:

typology Applicable Scenarios Price advantage
Dynamic residential (standard) Routine data collection 7.67 Yuan/GB
Dynamic Residential (Business) High-frequency visit requirements 9.47 Yuan/GB
Static homes Long-term fixed operations 35RMB/IP

Last week, I helped a friend to do airfare comparisons, and I was able to brush 2000 requests in an hour with the dynamic enterprise version, and the IP pool was large enough not to repeat the same thing. If you are doing account formation, you have to use static, an IP corresponds to an account to be safe.

A practical guide to avoiding the pit

A real case: once with a free agent to catch data, the results are returned to the false content! Later changed to ipipgo's TK line to solve. Here to teach you a way to detect whether the proxy is effective:


def check_proxy
  origin_ip = HTTParty.get('http://ip-api.com/json').parsed_response["query"]
  proxy_ip = HTTParty.get('http://ip-api.com/json', proxy_params).parsed_response["query"]
  puts "Original IP: {origin_ip} | proxy IP: {proxy_ip}"
end

If the two IPs are the same when running this code, it means that the proxy is not effective, so check the configuration parameters quickly. It is recommended to add this detection logic to the crawler and run it automatically every half hour.

Frequently Asked Questions QA

Q: What should I do if I always encounter CAPTCHA?
A: Use residential proxy + random UA header combo. ipipgo's client comes with UA randomization function, remember to adjust the request interval to 3-10 seconds random values

Q: What should I do if my agent is slow?
A:优先选地理位置近的节点,比如抓日本网站就用ipipgo的东京机房。他们的SERP API专线实测能压到200ms以内

Q: How do I get it if I need multiple threads?
A: Use Connection Pool to manage the proxy IP pool, each thread is assigned a separate IP. remember that the number of threads should not exceed the number of IPs, or it will be in vain!

Why do you recommend ipipgo?

this onecross-border rail lineIt is really fragrant, the last time to help customers catch Southeast Asian e-commerce data, with the ordinary agent success rate of only 40%, cut to their Singapore line directly soared to 92%. then say an internal news, their technical customer service 24 hours online, encountered problems directly to the error logs dumped over the past ten minutes will be able to give the solution.

Finally nagging sentence: do not try to cheap with a free agent, light blocking data heavy lawsuit. Regular business or have to use ipipgo this kind of serious qualification service providers, data security than that a little agent fee is much more important. Next time let's talk about how to do distributed crawler with the agent, to ensure than the market tutorials really!

我们的产品仅支持在境外网络环境下使用(除TikTok专线外),用户使用IPIPGO从事的任何行为均不代表IPIPGO的意志和观点,IPIPGO不承担任何法律责任。

business scenario

Discover more professional services solutions

💡 Click on the button for more details on specialized services

IPIPGO-五一狂欢 IP资源全场特价!

Professional foreign proxy ip service provider-IPIPGO

Contact Us

Contact Us

13260757327

Online Inquiry. QQ chat

E-mail: hai.liu@xiaoxitech.com

Working hours: Monday to Friday, 9:30-18:30, holidays off
Follow WeChat
Follow us on WeChat

Follow us on WeChat

Back to top
en_USEnglish