I'm trying to achieve something very trivial: Get a list of elements, and then do something with the innerText
of each element.
const tweets = await page.$$('.tweet');
From what I can tell, this returns a nodelist, just like the document.querySelectorAll()
method in the browser.
How do I just loop over it and get what I need? I tried various stuff, like:
[...tweets].forEach(tweet => {
console.log(tweet.innerText)
});
page.$$():
You can use a combination of elementHandle.getProperty()
and jsHandle.jsonValue()
to obtain the innerText
from an ElementHandle
obtained with page.$$()
:
const tweets = await page.$$('.tweet');
for (let i = 0; i < tweets.length; i++) {
const tweet = await (await tweets[i].getProperty('innerText')).jsonValue();
console.log(tweet);
}
If you are set on using the forEach()
method, you can wrap the loop in a promise:
const tweets = await page.$$('.tweet');
await new Promise((resolve, reject) => {
tweets.forEach(async (tweet, i) => {
tweet = await (await tweet.getProperty('innerText')).jsonValue();
console.log(tweet);
if (i === tweets.length - 1) {
resolve();
}
});
});
page.evaluate():
Alternatively, you can skip using page.$$()
entirely, and use page.evaluate()
:
const tweets = await page.evaluate(() => Array.from(document.getElementsByClassName('tweet'), e => e.innerText));
tweets.forEach(tweet => {
console.log(tweet);
});
According to puppeteer docs here, $$
Does not return a nodelist, instead it returns a Promise of Array of ElementHandle. It's way different then a NodeList.
There are several ways to solve the problem.
1. Using built-in function for loops called page.$$eval
This method runs Array.from(document.querySelectorAll(selector))
within the page and passes it as the first argument to pageFunction
.
So to get innerText is like following,
// Find all .tweet, and return innerText for each element, in a array.
const tweets = await page.$$eval('.tweet', element => element.innerText);
2. Pass the elementHandle
to the page.evaluate
Whatever you get from await page.$$('.tweet')
is an array of elementHandle. If you console, it will say JShandle
or ElementHandle
depending on the type.
Forget the hard explanation, it's easier to demonstrate.
// let's just call them tweetHandle
const tweetHandles = await page.$$('.tweet');
// loop thru all handles
for(const tweethandle of tweetHandles){
// pass the single handle below
const singleTweet = await page.evaluate(el => el.innerText, tweethandle)
// do whatever you want with the data
console.log(singleTweet)
}
Of course there are multiple ways to solve this problem, Grant Miller also answered few of them in the other answer.