How to get the entire document HTML as a string?

2019-01-01 06:39发布

站内文章 / JavaScript

33 0

浪荡孟婆

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

Is there a way in JS to get the entire HTML within the html tags, as a string?

document.documentElement.??

回答1:

MS added the outerHTML and innerHTML properties some time ago.

According to MDN, outerHTML is supported in Firefox 11, Chrome 0.2, Internet Explorer 4.0, Opera 7, Safari 1.3, Android, Firefox Mobile 11, IE Mobile, Opera Mobile, and Safari Mobile. outerHTML is in the DOM Parsing and Serialization specification.

See quirksmode for browser compatibility for what will work for you. All support innerHTML.

var markup = document.documentElement.innerHTML;
alert(markup);

回答2:

You can do

new XMLSerializer().serializeToString(document)

in browsers newer than IE 9.

回答3:

I believe document.documentElement.outerHTML should return that for you.

The MSDN page on the outerHTML property notes that it is supported in IE 5+. Colin\'s answer links to the W3C quirksmode page, which offers a good comparison of cross-browser compatibility (for other DOM features too).

回答4:

I tried the various answers to see what is returned. I\'m using the latest version of Chrome.

The suggestion document.documentElement.innerHTML; returned <head> ... </body>

Gaby\'s suggestion document.getElementsByTagName(\'html\')[0].innerHTML; returned the same.

The suggestion document.documentElement.outerHTML; returned <html><head> ... </body></html> which is everything apart from the \'doctype\'.

You can retrieve the doctype object with document.doctype; This returns an object, not a string, so if you need to extract the details as strings for all doctypes up to and including HTML5 it is described here: Get DocType of an HTML as string with Javascript

I only wanted HTML5, so the following was enough for me to create the whole document:

alert(\'<!DOCTYPE HTML>\' + \'\\n\' + document.documentElement.outerHTML);

回答5:

You can also do:

document.getElementsByTagName(\'html\')[0].innerHTML

You will not get the Doctype or html tag, but everything else...

回答6:

document.documentElement.outerHTML

回答7:

PROBABLY ONLY IE:

>     webBrowser1.DocumentText

for FF up from 1.0:

//serialize current DOM-Tree incl. changes/edits to ss-variable
var ns = new XMLSerializer();
var ss= ns.serializeToString(document);
alert(ss.substr(0,300));

may work in FF. (Shows up the VERY FIRST 300 characters from the VERY beginning of source-text, mostly doctype-defs.)

BUT be aware, that the normal \"Save As\"-Dialog of FF MIGHT NOT save the current state of the page, rather the originallly loaded X/h/tml-source-text !! (a POST-up of ss to some temp-file and redirect to that might deliver a saveable source-text WITH the changes/edits prior made to it.)

Although FF surprises by good recovery on \"back\" and a NICE inclusion of states/values on \"Save (as) ...\" for input-like FIELDS, textarea etc. , not on elements in contenteditable/ designMode...

If NOT a xhtml- resp. xml-file (mime-type, NOT just filename-extension!), one may use document.open/write/close to SET the the appr. content to the source-layer, that will be saved on user\'s save-dialog from the File/Save menue of FF. see: http://www.w3.org/MarkUp/2004/xhtml-faq#docwrite resp.

https://developer.mozilla.org/en-US/docs/Web/API/document.write

Neutral to questions of X(ht)ML, try a \"view-source:http://...\" as the value of the src-attrib of an (script-made!?) iframe, - to access an iframes-document in FF:

<iframe-elementnode>.contentDocument, see google \"mdn contentDocument\" for appr. members, like \'textContent\' for instance. \'Got that years ago and no like to crawl for it. If still of urgent need, mention this, that I got to dive in ...

回答8:

document.documentElement.innerHTML

回答9:

I always use

document.getElementsByTagName(\'html\')[0].innerHTML

Probably not the right way but I can understand it when I see it.

回答10:

Use document.documentElement.

Same Question answered here: https://stackoverflow.com/a/7289396/2164160

回答11:

To also get things outside the <html>...</html>, most importantly the <!DOCTYPE ...> declaration, you could walk through document.childNodes, turning each into a string:

const html = [...document.childNodes]
    .map(node => nodeToString(node))
    .join(\'\\n\') // could use \'\' instead, but whitespace should not matter.

function nodeToString(node) {
    switch (node.nodeType) {
        case node.ELEMENT_NODE:
            return node.outerHTML
        case node.TEXT_NODE:
            // Text nodes should probably never be encountered, but handling them anyway.
            return node.textContent
        case node.COMMENT_NODE:
            return `<!--${node.textContent}-->`
        case node.DOCUMENT_TYPE_NODE:
            return doctypeToString(node)
        default:
            throw new TypeError(`Unexpected node type: ${node.nodeType}`)
    }
}

I published this code as document-outerhtml on npm.

edit Note the code above depends on a function doctypeToString; its implementation could be as follows (code below is published on npm as doctype-to-string):

function doctypeToString(doctype) {
    if (doctype === null) {
        return \'\'
    }
    // Checking with instanceof DocumentType might be neater, but how to get a
    // reference to DocumentType without assuming it to be available globally?
    // To play nice with custom DOM implementations, we resort to duck-typing.
    if (!doctype
        || doctype.nodeType !== doctype.DOCUMENT_TYPE_NODE
        || typeof doctype.name !== \'string\'
        || typeof doctype.publicId !== \'string\'
        || typeof doctype.systemId !== \'string\'
    ) {
        throw new TypeError(\'Expected a DocumentType\')
    }
    const doctypeString = `<!DOCTYPE ${doctype.name}`
        + (doctype.publicId ? ` PUBLIC \"${doctype.publicId}\"` : \'\')
        + (doctype.systemId
            ? (doctype.publicId ? `` : ` SYSTEM`) + ` \"${doctype.systemId}\"`
            : ``)
        + `>`
    return doctypeString
}

回答12:

The correct way is actually:

webBrowser1.DocumentText

标签： javascript html document tostring

浪荡孟婆

女 | 书童

私信

收藏的人(0)

Ta的文章更多文章

0条评论

还没有人评论过~