How would I get the raw HTML of the selected content on a page using Javascript? For the sake of simplicity, I'm sticking with browsers supporting window.getSelection
.
Here is an example; the content between both |
represent my selection.
<p>
The <em>quick brown f|ox</em> jumps over the lazy <strong>d|og</strong>.
</p>
I can capture and alert the normalized HTML with the following Javascript.
var selectionRange = window.getSelection().getRangeAt(0);
selectionContents = selectionRange.cloneContents(),
fragmentContainer = document.createElement('div');
fragmentContainer.appendChild(selectionContents);
alert(fragmentContainer.innerHTML);
In the above example, the alerted contents would collapse the trailing elements and return the string <em>ox</em> jumps over the lazy <strong>d</strong>
.
How might I return the string ox</em> jumps over the lazy <strong>d
?
You would have to effectively write your own HTML serialiser.
Start at the selectionRange.startContainer
/startOffset
and walk the tree forwards from there until you get to endContainer
/endOffset
, outputting HTML markup from the nodes as you go, including open tags and attributes when you walk into an Element and close tags when you go up a parentNode
.
Not much fun, especially if you are going to have to support the very different IE<9 Range model at some point...
(Note also that you won't be able to get the completely raw original HTML, because that information is gone. Only the current DOM tree is stored by the browser, and that means details like tag case, attribute order, whitespace, and omitted implicit tags will differ between the source and what you get out.)
Looking at the API's, I don't think you can extract the HTML without it being converted to a DocumentFragment, which by default will close any open tags to make it valid HTML.
See Converting Range or DocumentFragment to string for a similar Q.