This is the code I am using
require_once 'goutte.phar';
use Goutte\Client;
$client = new Client();
for($i=0;$i<10;$i++){
$crawler = $client->request('GET', 'http://website.com');
echo '<p>'.$crawler->filterXpath('//meta[@property="og:description"]')->attr('content').'</p>';
echo '<p>'.$crawler->filter('title')->text().'</p>';
}
This works but takes a lot of time to process? Is there any way to do it faster.
For starters, there is nothing asynchronous about your code sample. Which means that your application will sequentially, perform a get request, wait for the response, parse the response and then loop back.
While Goutte uses Guzzle internally, it does not make use of Guzzles asynchronous capabilities.
To truly make your code asynchronous you will want to refer to the Guzzle Documentation on:
- Sending Requests within a Pool
- Asynchronous Response Handling
Your code sample above would result in something like:
require 'vendor/autoload.php' //assuming composer package management.
$client = new GuzzleHttp\Client();
$requests = [
$client->createRequest('GET', $url1),
$client->createRequest('GET', $url2),
$client->createRequest('GET', $url3),
$client->createRequest('GET', $url4),
$client->createRequest('GET', $url5),
$client->createRequest('GET', $url6),
$client->createRequest('GET', $url7),
$client->createRequest('GET', $url8),
$client->createRequest('GET', $url9),
$client->createRequest('GET', $url10),
];
$options = [
'complete' => [
[
'fn' => function (CompleteEvent $event) {
$crawler = new Symfony\Component\DomCrawler\Crawler(null, $event->getRequest()->getUrl());
$crawler->addContent($event->getResponse->getBody(), $event->getResponse()->getHeader('Content-Type'));
echo '<p>'.$crawler->filterXpath('//meta[@property="og:description"]')->attr('content').'</p>';
echo '<p>'.$crawler->filter('title')->text().'</p>';
},
'priority' => 0, // Optional
'once' => false // Optional
]
]
];
$pool = new GuzzleHttp\Pool($client, $requests, $options);
$pool->wait();