Possible Duplicates:
Jquery html() and self closing tags
Is it expected that jQuery $('span').html() turns XHTML br tag to html syntax?
I have a document like this
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<script src="/js/core/jquery.js" type="text/javascript"></script>
</head>
<body>
<div id="content">
<img src="/some/image.gif" />
</div>
</body>
</html>
I want to retrive html for some nodes to do something. But I see one problem with IMG.
$('#content').html();
This returns invalid XHTML
<img src="/some/image.gif">
jQuery version is 1.4.2.
This is a duplicate, but none of the answers in the original question gives an actual workaround. (Probably rightly so. I can't but agree with what @Cletus says in his - as always excellent - answer.)
But if you're stuck with XHTML: there is a jQuery plugin providing a innerXHTML()
function (poorly maintained though, last updated 2007, and never made it from Beta) and a JavaScript tool named innerXHTML that promises to do what you need. If you need this, that may be your best bet.
innerHTML
/html()
won't give you an XHTML serialisation unless you've actually served the page as XHTML, under the application/xhtml+xml
media type. (And that doesn't work on IE<9.) If you are serving your page as text/html
, then your self-closing tags are nothing but ignored clutter to the browser when it's parsing your source into a DOM. You cannot expect to get the same format HTML out of the DOM serialised from that as you put in.
In fact in some cases in IE innerHTML
won't even give you a valid HTML serialisation: it omits quotes around attr in some cases where it shouldn't. In short, you cannot rely on innerHTML
giving you any particular format of markup. It might re-order attributes, it might HTML-escape different characters, it might normalise attribute values, it might change whitespace. So doing string operations on the html()
return value is a non-starter. All you can rely on is that you can assign the serialised markup back to the innerHTML
of another element and the browser will be able to parse it.
What's your purpose in trying to retrieve XHTML? You may be able to achieve more using normal DOM-style manipulations.
ETA re comment:
Then XHTML validity is the least of your worries. It doesn't matter if the HTML isn't well-formed, you will still be able to write it back to html()
. But:
What you can't reliably do with the html()
is tell what sentences are in text content and what are in attribute values. For example <img title="Hello, this is some description. Another sentence.">
is markup and if you start putting <span>
s inside the title
attribute, you're obviously going to have difficulties.
This is a text-processing task, so you should do it on text nodes, not markup. This is a bit tricky and jQuery doesn't give you any special tools to do it. But see the findText
function from this answer and you could use it like:
// Split each text node into things that look like sentences and wrap
// each in a span.
//
var element= $('#content')[0];
findText(element, /.*?[.?!]\s+?/g, function(node, match) {
var wrap= document.createElement('span');
node.splitText(match.index+match[0].length);
wrap.appendChild(node.splitText(match.index));
node.parentNode.insertBefore(span, node.nextSibling);
});
.html()
doesn't return the HTML or XHTML markup that was provided to make the nodes, it returns the browser's internal representation of the relevant nodes. This internal representation is not XHTML compliant and is not made to be--it's technically an internal implementation detail.
You can throw any HTML or XHTML at a browser and it will parse it into an internal DOM. The internal DOM does not differ if the original source was crappy HTML, good HTML, or perfect XHTML. The resulting DOM for equivalent source documents will be the same, but generating a new HTML document from that will not necessarily be exactly the same as the source.