I posted this question to the PhantomJS mailing list a week ago, but have gotten no response. Hoping for better luck here...
I've been trying to use PhantomJS to scrape information from YouTube, but haven't been able to get it working.
Consider a YouTube video embedded into a web page via an iframe element. If you load the URL referenced by the src attribute directly into a browser, you get a full-page version of the video, where the video is encapsulated in an embed element. The embed element is not present in the initial page content; rather, some script tags on the page cause some Javascript to be evaluated which eventually adds the embed element to the DOM. I want to be able to access this embed element when it appears, but it never appears when I load the page in PhantomJS.
Here's the code I'm using:
var page = require("webpage").create();
page.settings.userAgent = "Mozilla/5.0 (X11; rv:24.0) Gecko/20130909 Firefox/24.0";
page.open("https://www.youtube.com/embed/dQw4w9WgXcQ", function (status) {
if (status !== "success") {
console.log("Failed to load page");
phantom.exit();
} else {
setTimeout(function () {
var size = page.evaluate(function () {
return document.getElementsByTagName("EMBED").length;
});
console.log(size);
phantom.exit();
}, 15000);
}
});
I only ever see "0" printed to the console, no matter how long I set the timeout. If I look for "DIV" elements I get "3", and if I look for "SCRIPT" elements I get "5", so the code seems to be sound. I just never find any "EMBED" tags, even though if I load the URL above in my browser I do find one soon after page-load.
Does anyone have any idea what the problem might be? Thanks in advance for any help.