How can I get a web page into a string using JavaS

2019-06-01 02:30发布

问题:

I need to get the html content of a page using JavaScript, the page could be also on another domain, kind of what does wget but in JavaScript. I want to use it for a kind of web-crawler.

Using JavaScript, how can I get content of a page, provided I have an URL, and get it into a string?

回答1:

Try this:

function cbfunc(html) { alert(html.results[0]); }
$.getScript('http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20html%20where%20url%3D%22' + 
encodeURIComponent(url) + '%22&format=xml&diagnostics=true&callback=cbfunc');

DEMO

More about YQL



回答2:

The general way to load content over HTTP via JavaScript is to use the XMLHttpRequest object. This is subject to the same origin policy so to access content on other domains you have to circumvent it.

This assumes you are running JS in a web browser (implied by "the page could be also on another domain"). If you were not that other options would be open to you. For example, with nodejs you could use the http client it has.



回答3:

If you want to also capture the hmtl tags you could concatenate them to the html like this:

 function getPageHTML() {
       return "<html>" + $("html").html() + "</html>";
    }

How do I get the entire page's HTML with jQuery?