Get JSON page content with PhantomJS

2019-01-24 01:00发布

问题:

I would like to know how to parse JSON in phantomjs. Any page content is enclosed in html (<html><body><pre>{JSON string}</pre></body></html>). Is there an options to remove enclosing tags or asking for a different Content-Type as "application/json"? If not, what's the best way to parse it. Is it using jQuery after including with includeJS jQuery?

回答1:

Since you are using PhantomJS which is built of the webkit browser you have access to the native JSON library. There is no need to use page.evaluate, you can just use the plainText property on the page object.

http://phantomjs.org/api/webpage/property/plain-text.html

var page = require('webpage').create();
page.open('http://somejsonpage.com', function () {
    var jsonSource = page.plainText;
    var resultObject = JSON.parse(jsonSource);
    phantom.exit();
});


回答2:

Here is what I did:

var obj = page.evaluate(function() {
    return eval('(' + document.body.innerText + ')');
}

Then the obj you got is the JSON object returned from that page.



回答3:

As already in the accepted answer, I would suggest using JSON.parse() for converting a JSON string into an object.

For example, your code could look like this:

var jsonObject = page.evaluate(function() {
  return JSON.parse(page.plainText);
});


回答4:

If the json data contains html strings, they will be removed within the suggested page.plainText attribute.