Can't figure out why I can't retrieve a si

2019-03-01 04:54发布

问题:

I can't figure out why I can't retrieve a simple string with XPath with this very simple snippet

var page = new WebPage();
page.open('http://free.fr', function (status) {
    if (status !== 'success') {
        console.log('Unable to access network');
    } else {
        function getElementByXpath(path) {
          return document.evaluate(path, document, null, XPathResult.STRING_TYPE, null).stringValue;
        }

        console.log( getElementByXpath("//title/text()") );
    }
    phantom.exit();
}

always return nothing.

What I missed to print the title value?

回答1:

PhantomJS has two contexts. Only the DOM context (page context) has access to the DOM, but it is sandboxed. You get access to the DOM context through page.evaluate. But remember that:

Note: The arguments and the return value to the evaluate function must be a simple primitive object. The rule of thumb: if it can be serialized via JSON, then it is fine.

Closures, functions, DOM nodes, etc. will not work!

This means that you cannot pass any DOM node that you find to the outer context. Although, there is a document object outside of the DOM context, but it doesn't do anything. It's only a relict of the way PhantomJS is written on top of QtWebkit.

Here's an example fix:

var page = new WebPage();
page.onConsoleMessage = function(msg){
    console.log("remote: " + msg);
};
page.open('http://google.fr', function (status) {
    if (status !== 'success') {
        console.log('Unable to access network');
    } else {
        page.evaluate(function(){
            function getElementByXpath(path) {
              return document.evaluate(path, document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null).singleNodeValue;
            }

            console.log( getElementByXpath("//head/title/text()").textContent );
        });
    }
    phantom.exit();
});