IPIPGO ip proxy Laravel open source crawler application to build tutorials

Laravel open source crawler application to build tutorials

First, why is your crawler always pulled by the site? Brothers who have engaged in data collection must have encountered this situation: just run a good crawler program, suddenly 403 prohibit access, or receive a bunch of verification codes. This thing is frankly your IP is targeted by the site. Ordinary crawlers with fixed IP crazy please...

Laravel open source crawler application to build tutorials

First, why is your crawler always pulled by the site?

Brothers who have engaged in data collection must have encountered this situation: just run a good crawler program, suddenly the403 Denial of Accessup, or get a bunch of CAPTCHAs. To put it plainly, your IP is being targeted by the website. Ordinary crawlers with fixed IP frantic request, just like holding a loud speaker in front of people's homes shouting "I want to steal data", not block you block who?

And here comes our savior--proxy IPThe IP address is changed every time a request is made through specialized services such as ipipgo. Through ipipgo this kind of professional services, each request for a different IP address, equivalent to the crawler to wear a myriad of "face masks". For example, the original 1,000 requests in an hour with 1 IP, now replaced by 100 IP turns to send, each IP only sent 10 times, the site wind control system can not detect abnormalities.

Second, hand to build Laravel crawler

Let's not rush into writing code. We need to get our stuff ready:

  1. Installing PHP 7.4+ and Composer
  2. Create a new Laravel project:composer create-project laravel/laravel crawler
  3. Install Goutte, the crawler tool:composer require fabpot/goutte

The core code is really just three pieces (don't let the technical jargon scare you):


// Create a new CrawlCommand.php in app/Console/Commands.
public function handle()
{
    $proxy = 'http://用户名:密码@gateway.ipipgo.com:9020'; // ipipgo's proprietary proxy format
    $goutte = new Client();
    $goutte->setClient(new HttpClient(['proxy' => $proxy]));

    // Specific capture logic...
}

Third, the actual skills of proxy IP

It's not enough to be able to use proxies, theselife-saving techniqueMust be mastered:

pothole prescription
Sudden IP failure Use ipipgo's auto-switching API to switch to a new IP in seconds when it fails.
Excessive frequency of requests Randomized delay of 2-8 seconds to simulate a real person's operation
Encountering CAPTCHA Access to ipipgo's Captcha Recognition Service

Special Reminder: Don't save the timeout! It is recommended to set connect_timeout to 5 seconds and request_timeout to 30 seconds to avoid the whole program getting stuck on a certain IP.

IV. QA time: a must for novices

Q: What should I do if the proxy IP often fails to connect?
A: 80% are using low quality proxies. Recommend ipipgo'sBusiness Level Agent PackageThe IPs in their house have survival detection, and the measured connection success rate can go up to 99.2%.

Q: How to break the collection speed is too slow?
A: Two tricks: ① use ipipgo'smultithreaded agent pool② Enable HTTP persistent connection to reduce the number of TCP handshakes.

Q: How can I tell if a proxy is anonymous?
A: Visit http://httpbin.org/ip, if the proxy IP is returned instead of your real IP, it means that ipipgo's high stash proxy is in effect.

V. Upgrade Play: Distributed Crawler

When the standalone can't hold up, it's time to get on thedistributed architectureUp. Use Redis to do the task queue, multiple servers running at the same time, each machine from ipipgo apply for a different IP segment. This is not a dream to collect millions of data per day, and it is not easy to be targeted by anti-crawling strategies.

One final rant: don't go cheap when choosing a proxy service! Some free agents willStealing response contentorRecord your request dataThe first thing you need to do is to use a regular service provider like ipipgo. With ipipgo this kind of formal service provider, data security is guaranteed, out of the problem can also find technical customer service real-time processing.

This article was originally published or organized by ipipgo.https://www.ipipgo.com/en-us/ipdaili/31344.html

business scenario

Discover more professional services solutions

💡 Click on the button for more details on specialized services

New 10W+ U.S. Dynamic IPs Year-End Sale

Professional foreign proxy ip service provider-IPIPGO

Leave a Reply

Your email address will not be published. Required fields are marked *

Contact Us

Contact Us

13260757327

Online Inquiry. QQ chat

E-mail: hai.liu@xiaoxitech.com

Working hours: Monday to Friday, 9:30-18:30, holidays off
Follow WeChat
Follow us on WeChat

Follow us on WeChat

Back to top
en_USEnglish