I made a very simple script which scrape a recipes website to get the title, time of preparation and the ingredients. Everything works fine except that the script is not able to scrape each page of my arrays. Sometimes i get 4 of them, sometimes 2, sometimes even 0 ...
It seems that the script doesn't wait the body to be fully loaded. I'm fully aware that cheerio doesn't understand javascript on website, but for all i know the information I scrape aren't generated from any script, it is pure HTML.
How can i ask cheerio to wait 1 second when a page is visited, or simply to wait for the html to be fully loaded.
Here is my code, it works so you can try it, and an example of the output :
pools = [
"http://www.marmiton.org/recettes/recette_salade-de-betteraves-a-l-orientale_16831.aspx",
"http://www.marmiton.org/recettes/recette_pain-d-epices-a-la-dijonnaise_16832.aspx",
"http://www.marmiton.org/recettes/recette_tarte-au-chocolat-et-creme-moka_16834.aspx",
"http://www.marmiton.org/recettes/recette_poulet-a-la-gaston-gerard_16836.aspx",
"http://www.marmiton.org/recettes/recette_assiette-paula_16837.aspx"]
var request = require("request");
var cheerio = require("cheerio");
var poolsLength = pools.length;
for (var i = 0 ; i < pools.length ; i++) {
var url = pools[i];
request(url, function (error, response, body) {
if (!error) {
var $ = cheerio.load(body,{
ignoreWhitespace: true
});
var name = [];
var address = [];
var website = [];
$('body').each(function(i, elem){
name = $(elem).find('.fn').text();
address = $(elem).find('.preptime').text();
website = $(elem).find('.m_content_recette_ingredients').text();
console.log(name+"±"+address+"±"+website);}
)}
})
};`
As you can see above, it only worked for 2 of 5 pages.