I can't figure out why I can't retrieve a simple string with XPath with this very simple snippet
var page = new WebPage();
page.open('http://free.fr', function (status) {
if (status !== 'success') {
console.log('Unable to access network');
} else {
function getElementByXpath(path) {
return document.evaluate(path, document, null, XPathResult.STRING_TYPE, null).stringValue;
}
console.log( getElementByXpath("//title/text()") );
}
phantom.exit();
}
always return nothing.
What I missed to print the title value?
PhantomJS has two contexts. Only the DOM context (page context) has access to the DOM, but it is sandboxed. You get access to the DOM context through page.evaluate
. But remember that:
Note: The arguments and the return value to the evaluate
function must be a simple primitive object. The rule of thumb: if it can be serialized via JSON, then it is fine.
Closures, functions, DOM nodes, etc. will not work!
This means that you cannot pass any DOM node that you find to the outer context. Although, there is a document
object outside of the DOM context, but it doesn't do anything. It's only a relict of the way PhantomJS is written on top of QtWebkit.
Here's an example fix:
var page = new WebPage();
page.onConsoleMessage = function(msg){
console.log("remote: " + msg);
};
page.open('http://google.fr', function (status) {
if (status !== 'success') {
console.log('Unable to access network');
} else {
page.evaluate(function(){
function getElementByXpath(path) {
return document.evaluate(path, document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null).singleNodeValue;
}
console.log( getElementByXpath("//head/title/text()").textContent );
});
}
phantom.exit();
});