I've got a simple node.js script to capture screenshots of a few web pages. It appears I'm getting tripped up somewhere along the line with my use of async/await, but I can't figure out where. I'm currently using puppeteer v1.11.0.
const puppeteer = require('puppeteer');
//a list of sites to screenshot
const papers =
{
nytimes: "https://www.nytimes.com/",
wapo: "https://www.washingtonpost.com/"
};
//launch puppeteer, do everything in .then() handler
puppeteer.launch({devtools:false}).then(function(browser){
//create a load_page function that returns a promise which resolves when screenshot is taken
async function load_page(paper){
const url = papers[paper];
return new Promise(async function(resolve, reject){
const page = await browser.newPage();
await page.setViewport({width:1024, height: 768});
//screenshot on first console message
page.once("console", async console_msg => {
await page.pdf({path: paper + '.pdf',
printBackground:true,
width:'1024px',
height:'768px',
margin: {top:"0px", right:"0px", bottom:"0px", left:"0px"}
});
//close page
await page.close();
//resolve promise
resolve();
});
//go to page
await page.goto(url, {"waitUntil":["load", "networkidle0"]});
})
}
//step through the list of papers, calling the above load_page()
async function stepThru(){
for(var p in papers){
if(papers.hasOwnProperty(p)){
//wait to load page and screenshot before loading next page
await load_page(p);
}
}
//close browser after loop has finished (and all promises resolved)
await browser.close();
}
//kick it off
stepThru();
//getting this error message:
//UnhandledPromiseRejectionWarning: Error: Navigation failed because browser has disconnected!
});
The
Navigation failed because browser has disconnected
error usually means that the node scripts that launched Puppeteer ends without waiting for the Puppeteer actions to be completed. Hence it's a problem with some waitings as you told.About your script, I made some changes to make it work:
1 - first of all you're not awaiting the (async) end of the
stepThru
function so changeto
and
to
(I added
async
)2 - I changed the way you manage the
goto
andpagce.once
promisesThe PDF promise is now
and it has a single responsibility, just the PDF creation.
3 - then I managed both the
page.goto
and PDF promises with aPromise.all
4 - I moved the
page.close
after thePromise.all
And now it works, here the full working script
Please note that: - I changed
networkidle0
tonetworkidle2
because the nytimes.com website takes a very long time to land a 0 network requests state (because of the AD etc.). You can wait fornetworkidle0
obviously but it's up to you, it's out of the scope of your question (increase thepage.goto
timeout in that case) - thewww.washingtonpost.com
site goes toTOO_MANY_REDIRECTS
error so I changed towashingtonpost.com
but I think that you should investigate more about it. To test the script I used more times thenytimes
site and other websites. Again: it's out of the scope of your questionLet me know if you need some more help