I'd like to save the html string of the DOM, and later restore it to be exactly the same. The code looks something like this:
var stringified = document.documentElement.innerHTML
// later, after serializing and deserializing
document.documentElement.innerHTML = stringified
This works when everything is perfect, but when the DOM is not w3c-comliant, there's a problem. The first line works fine, stringified
matches the DOM exactly. But when I restore from the (non-w3c-compliant) stringified
, the browser does some magic and the resulting DOM is not the same as it was originally.
For example, if my original DOM looks like
<p><div></div></p>
then the final DOM will look like
<p></p><div></div><p></p>
since div
elements are not allowed to be inside p
elements. Is there some way I can get the browser to use the same html parsing that it does on page load and accept broken html as-is?
Why is the html broken in the first place? The DOM is not controlled by me.
Here's a jsfiddle to show the behavior http://jsfiddle.net/b2x7rnfm/5/. Open your console.
<body>
<div id="asdf"><p id="outer"></p></div>
<script type="text/javascript">
var insert = document.createElement('div');
var text = document.createTextNode('ladygaga');
insert.appendChild(text);
document.getElementById('outer').appendChild(insert);
var e = document.getElementById('asdf')
console.log(e.innerHTML);
e.innerHTML = e.innerHTML;
console.log(e.innerHTML); // This is different than 2 lines above!!
</script>
</body>
If you need to be able to save and restore an invalid HTML structure, you could do it by way of XML. The code which follows comes from this fiddle.
To save, you create a new XML document to which you add the nodes you want to serialize:
To restore, you create an XML document to deserialize what you saved and then add the nodes to your document:
Using the fiddle, you can:
Press the "Add LadyGaga..." button to create the invalid HTML.
Press "Save and Remove from Document" to save the structure in
asdf
and clear its contents. This prints to the console what was saved.Press "Restore" to restore the structure that was saved.
The code above aims to be general. It would be possible to simplify the code if some assumptions can be made about the HTML structure to be saved. For instance
blah
is not a well-formed XML document because you need a single top element in XML. So the code above takes pains to add a top-level element (top
) to prevent this problem. It is also generally not possible to just parse an HTML serialization as XML so the save operation serializes to XML.This is a proof-of-concept more than anything. There could be side-effects from moving nodes created in an HTML document to an XML document or the other way around that I have not anticipated. I've run the code above on Chrome and FF. I don't have IE at hand to run it there.
You can use
outerHTML
, it perseveres the original structure:(based on your original sample)
Demo: http://jsfiddle.net/b2x7rnfm/7
This won't work for your most recent clarification, that you must have a string copy. Leaving it, though, for others who may have more flexibility.
Since using the DOM seems to allow you to preserve, to some degree, the invalid structure, and using
innerHTML
involves reparsing with (as you've observed) side-effects, we have to look at not usinginnerHTML
:You can clone the original, and then swap in the clone:
Live Example:
Note that just like the
innerHTML
solution, this will wipe out event handlers on the elements in question. You could preserve handlers on the outermost element by creating a document fragment and cloning its children into it, but that would still lose handlers on the children.This earlier solution won't apply to you, but may apply to others in the future:
My earlier solution was to track what you changed, and undo the changes one-by-one. So in your example, that means removing the
insert
element:You can not expect HTML to be parsed as a non-compliant HTML. But since the structure of compiled non-compliant HTML is very predictable you can make a function which makes the HTML non-compliant again like this:
See the fiddle: http://jsfiddle.net/pqah8e25/3/
Try utilizing
Blob
,URL.createObjectURL
to exporthtml
; includescript
tag in exportedhtml
which removes<div></div><p></p>
elements from renderedhtml
documenthtml
js
jsfiddle http://jsfiddle.net/b2x7rnfm/11/
You have to clone the node instead of copying html. Parsing rules will force the browser to close
p
when seeingdiv
.If you really need to get html from that string and it is valid xml, then you can use following code (
$
isjQuery
):