-->

Restore exact innerHTML to DOM

2020-08-09 09:17发布

问题:

I'd like to save the html string of the DOM, and later restore it to be exactly the same. The code looks something like this:

var stringified = document.documentElement.innerHTML
// later, after serializing and deserializing
document.documentElement.innerHTML = stringified

This works when everything is perfect, but when the DOM is not w3c-comliant, there's a problem. The first line works fine, stringified matches the DOM exactly. But when I restore from the (non-w3c-compliant) stringified, the browser does some magic and the resulting DOM is not the same as it was originally.

For example, if my original DOM looks like

<p><div></div></p>

then the final DOM will look like

<p></p><div></div><p></p>

since div elements are not allowed to be inside p elements. Is there some way I can get the browser to use the same html parsing that it does on page load and accept broken html as-is?

Why is the html broken in the first place? The DOM is not controlled by me.

Here's a jsfiddle to show the behavior http://jsfiddle.net/b2x7rnfm/5/. Open your console.

<body>
    <div id="asdf"><p id="outer"></p></div>
    <script type="text/javascript">
        var insert = document.createElement('div');
        var text = document.createTextNode('ladygaga');
        insert.appendChild(text);
        document.getElementById('outer').appendChild(insert);
        var e = document.getElementById('asdf')
        console.log(e.innerHTML);
        e.innerHTML = e.innerHTML;
        console.log(e.innerHTML); // This is different than 2 lines above!!
    </script>
</body>

回答1:

If you need to be able to save and restore an invalid HTML structure, you could do it by way of XML. The code which follows comes from this fiddle.

To save, you create a new XML document to which you add the nodes you want to serialize:

var asdf = document.getElementById("asdf");
var outer = document.getElementById("outer");
var add = document.getElementById("add");
var save = document.getElementById("save");
var restore = document.getElementById("restore");

var saved = undefined;
save.addEventListener("click", function () {
  if (saved !== undefined)
    return; /// Do not overwrite

  // Create a fake document with a single top-level element, as 
  // required by XML.    
  var parser = new DOMParser();
  var doc = parser.parseFromString("<top/>", "text/xml");

  // We could skip the cloning and just move the nodes to the XML
  // document. This would have the effect of saving and removing 
  // at the same time but I wanted to show what saving while 
  // preserving the data would look like    
  var clone = asdf.cloneNode(true);
  var top = doc.firstChild;
  var child = asdf.firstChild;
  while (child) {
    top.appendChild(child);
    child = asdf.firstChild;
  }
  saved = top.innerHTML;
  console.log("saved as: ", saved);

  // Perform the removal here.
  asdf.innerHTML = "";
});

To restore, you create an XML document to deserialize what you saved and then add the nodes to your document:

restore.addEventListener("click", function () {
  if (saved === undefined)
      return; // Don't restore undefined data!

  // We parse the XML we saved.
  var parser = new DOMParser();
  var doc = parser.parseFromString("<top>" + saved + "</top>", "text/xml");
  var top = doc.firstChild;

  var child = top.firstChild;
  while (child) {
    asdf.appendChild(child);
    // Remove the extra junk added by the XML parser.
    child.removeAttribute("xmlns");
    child = top.firstChild;
  }
  saved = undefined;
  console.log("inner html after restore", asdf.innerHTML);
});

Using the fiddle, you can:

  1. Press the "Add LadyGaga..." button to create the invalid HTML.

  2. Press "Save and Remove from Document" to save the structure in asdf and clear its contents. This prints to the console what was saved.

  3. Press "Restore" to restore the structure that was saved.

The code above aims to be general. It would be possible to simplify the code if some assumptions can be made about the HTML structure to be saved. For instance blah is not a well-formed XML document because you need a single top element in XML. So the code above takes pains to add a top-level element (top) to prevent this problem. It is also generally not possible to just parse an HTML serialization as XML so the save operation serializes to XML.

This is a proof-of-concept more than anything. There could be side-effects from moving nodes created in an HTML document to an XML document or the other way around that I have not anticipated. I've run the code above on Chrome and FF. I don't have IE at hand to run it there.



回答2:

This won't work for your most recent clarification, that you must have a string copy. Leaving it, though, for others who may have more flexibility.


Since using the DOM seems to allow you to preserve, to some degree, the invalid structure, and using innerHTML involves reparsing with (as you've observed) side-effects, we have to look at not using innerHTML:

You can clone the original, and then swap in the clone:

var e = document.getElementById('asdf')
snippet.log("1: " + e.innerHTML);
var clone = e.cloneNode(true);
var insert = document.createElement('div');
var text = document.createTextNode('ladygaga');
insert.appendChild(text);
document.getElementById('outer').appendChild(insert);
snippet.log("2: " + e.innerHTML);
e.parentNode.replaceChild(clone, e);
e = clone;
snippet.log("3: " + e.innerHTML);

Live Example:

var e = document.getElementById('asdf')
snippet.log("1: " + e.innerHTML);
var clone = e.cloneNode(true);
var insert = document.createElement('div');
var text = document.createTextNode('ladygaga');
insert.appendChild(text);
document.getElementById('outer').appendChild(insert);
snippet.log("2: " + e.innerHTML);
e.parentNode.replaceChild(clone, e);
e = clone;
snippet.log("3: " + e.innerHTML);
<div id="asdf">
  <p id="outer">
    <div>ladygaga</div>
  </p>
</div>

<!-- Script provides the `snippet` object, see http://meta.stackexchange.com/a/242144/134069 -->
<script src="http://tjcrowder.github.io/simple-snippets-console/snippet.js"></script>

Note that just like the innerHTML solution, this will wipe out event handlers on the elements in question. You could preserve handlers on the outermost element by creating a document fragment and cloning its children into it, but that would still lose handlers on the children.


This earlier solution won't apply to you, but may apply to others in the future:

My earlier solution was to track what you changed, and undo the changes one-by-one. So in your example, that means removing the insert element:

var e = document.getElementById('asdf')
console.log("1: " + e.innerHTML);
var insert = document.createElement('div');
var text = document.createTextNode('ladygaga');
insert.appendChild(text);
var outer = document.getElementById('outer');
outer.appendChild(insert);
console.log("2: " + e.innerHTML);
outer.removeChild(insert);
console.log("3: " + e.innerHTML);

var e = document.getElementById('asdf')
snippet.log("1: " + e.innerHTML);
var insert = document.createElement('div');
var text = document.createTextNode('ladygaga');
insert.appendChild(text);
var outer = document.getElementById('outer');
outer.appendChild(insert);
snippet.log("2: " + e.innerHTML);
outer.removeChild(insert);
snippet.log("3: " + e.innerHTML);
<div id="asdf">
  <p id="outer">
    <div>ladygaga</div>
  </p>
</div>

<!-- Script provides the `snippet` object, see http://meta.stackexchange.com/a/242144/134069 -->
<script src="http://tjcrowder.github.io/simple-snippets-console/snippet.js"></script>



回答3:

Try utilizing Blob , URL.createObjectURL to export html ; include script tag in exported html which removes <div></div><p></p> elements from rendered html document

html

<body>
    <div id="asdf">
        <p id="outer"></p>
    </div>
    <script>
        var insert = document.createElement("div");
        var text = document.createTextNode("ladygaga");
        insert.appendChild(text);
        document.getElementById("outer").appendChild(insert);
        var elem = document.getElementById("asdf");
        var r = document.querySelectorAll("[id=outer] ~ *");
        // remove last `div` , `p` elements from `#asdf`
        for (var i = 0; i < r.length; ++i) {
            elem.removeChild(r[i])
        }
    </script>
</body>

js

var e = document.getElementById("asdf");   
var html = e.outerHTML;  
console.log(document.body.outerHTML);   
var blob = new Blob([document.body.outerHTML], {
    type: "text/html"
});   
var objUrl = window.URL.createObjectURL(blob);
var popup = window.open(objUrl, "popup", "width=300, height=200");

jsfiddle http://jsfiddle.net/b2x7rnfm/11/



回答4:

see this example: http://jsfiddle.net/kevalbhatt18/1Lcgaprc/

MDN cloneNode

var e = document.getElementById('asdf')
console.log(e.innerHTML);
backupElem = e.cloneNode(true);
// Your tinkering with the original
e.parentNode.replaceChild(backupElem, e);
console.log(e.innerHTML);



回答5:

You can not expect HTML to be parsed as a non-compliant HTML. But since the structure of compiled non-compliant HTML is very predictable you can make a function which makes the HTML non-compliant again like this:

function ruinTheHtml() {

var allElements = document.body.getElementsByTagName( "*" ),
    next,
    afterNext;

Array.prototype.map.call( allElements,function( el,i ){

    if( el.tagName !== 'SCRIPT' && el.tagName !== 'STYLE' ) {

        if(el.textContent === '') {

            next = el.nextSibling;

            afterNext = next.nextSibling;

            if( afterNext.textContent === '' ) {

                el.parentNode.removeChild( afterNext );
                el.appendChild( next );

            }

        }

    }
});

}

See the fiddle: http://jsfiddle.net/pqah8e25/3/



回答6:

You have to clone the node instead of copying html. Parsing rules will force the browser to close p when seeing div.

If you really need to get html from that string and it is valid xml, then you can use following code ($ is jQuery):

var html = "<p><div></div></p>";
var div = document.createElement("div");
var xml = $.parseXML(html);
div.appendChild(xml.documentElement);
div.innerHTML === html // true


回答7:

You can use outerHTML, it perseveres the original structure:

(based on your original sample)

<div id="asdf"><p id="outer"></p></div>

<script type="text/javascript">
    var insert = document.createElement('div');
    var text = document.createTextNode('ladygaga');
    insert.appendChild(text);
    document.getElementById('outer').appendChild(insert);
    var e = document.getElementById('asdf')
    console.log(e.outerHTML);
    e.outerHTML = e.outerHTML;
    console.log(e.outerHTML);
</script>

Demo: http://jsfiddle.net/b2x7rnfm/7