I'm trying to scrape the $('a[href^="mailto:"]')
of this website: https://celsius.network/
When I go to the browser console and run that, I get a link so I know it's there.
The issue is that my request (using the Axios library) returns the DOM before javascript is loaded. I've set the User-Agent, but it looks like it's not working.
const axiosClient = () =>
axios.create({
headers: {
"User-Agent": "Mozilla/5.0 (iPhone; CPU iPhone OS 8_0 like Mac OS X) AppleWebKit/600.1.3 (KHTML, like Gecko) Version/8.0 Mobile/12A4345d Safari/600.1.4"
},
timeout: 10000
});
axiosClient()
.get("https://celsius.network")
.then(({ data }) => {
console.log("DATAAAAAAAA: ", data);
})
This is returning the original HTML, with the body:
<body>
<div id="app"> </div>
....
instead of the one that's fully loaded after all the javascript has manipulated the DOM.
P.S. I am doing this through firebase functions, so I think there are limits to what I can install.
UPDATE
const findEmail = url =>
new Promise((resolve, reject) => {
// here!
});
Your request approach isn't enough to emulate what you'd expect while visiting a page in your browser. While there are some choices out there, puppeteer may be a candidate for the job.
Most things that you can do manually in the browser can be done using Puppeteer!
Check out the following...
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://celsius.network/');
const textContent = await page.evaluate(() => document.querySelector('a[href^="mailto:"]').textContent);
console.log(textContent); // presale@celsius.network
browser.close();
})();
I'm not totally clear on your constraints...
there are limits to what I can install
If you have axios, I'd assume you can install this npm package?
Per your update, puppeteer can also be crafted via the promise api. The following should do it for you...
const findEmail = url =>
new Promise((resolve, reject) => {
puppeteer.launch().then((browser) => {
browser.newPage().then((page) => {
page.goto('https://celsius.network/').then(() => {
page.evaluate(() => document.querySelector('a[href^="mailto:"]').textContent).then((element) => {
resolve(element);
browser.close();
});
});
});
});
});
findEmail().then((email) => {
console.log(email); // presale@celsius.network
});