Sending multiple goutte requests asynchronously

2019-06-03 06:54发布

问题:

This is the code I am using

require_once 'goutte.phar';
use Goutte\Client;
$client = new Client();
for($i=0;$i<10;$i++){
     $crawler = $client->request('GET', 'http://website.com');
     echo '<p>'.$crawler->filterXpath('//meta[@property="og:description"]')->attr('content').'</p>';
     echo '<p>'.$crawler->filter('title')->text().'</p>';
}

This works but takes a lot of time to process? Is there any way to do it faster.

回答1:

For starters, there is nothing asynchronous about your code sample. Which means that your application will sequentially, perform a get request, wait for the response, parse the response and then loop back.

While Goutte uses Guzzle internally, it does not make use of Guzzles asynchronous capabilities.

To truly make your code asynchronous you will want to refer to the Guzzle Documentation on:

  • Sending Requests within a Pool
  • Asynchronous Response Handling

Your code sample above would result in something like:

require 'vendor/autoload.php' //assuming composer package management.

$client = new GuzzleHttp\Client();

$requests = [
    $client->createRequest('GET', $url1),
    $client->createRequest('GET', $url2),
    $client->createRequest('GET', $url3),
    $client->createRequest('GET', $url4),
    $client->createRequest('GET', $url5),
    $client->createRequest('GET', $url6),
    $client->createRequest('GET', $url7),
    $client->createRequest('GET', $url8),
    $client->createRequest('GET', $url9),
    $client->createRequest('GET', $url10),  
];

$options = [
    'complete' => [
        [
            'fn' => function (CompleteEvent $event) {
                $crawler = new Symfony\Component\DomCrawler\Crawler(null, $event->getRequest()->getUrl());
                $crawler->addContent($event->getResponse->getBody(), $event->getResponse()->getHeader('Content-Type'));
                echo '<p>'.$crawler->filterXpath('//meta[@property="og:description"]')->attr('content').'</p>';
                echo '<p>'.$crawler->filter('title')->text().'</p>';
            },
            'priority' => 0,    // Optional
            'once'     => false // Optional
        ]
    ]
];

$pool = new GuzzleHttp\Pool($client, $requests, $options);

$pool->wait();