For some reason DOMParser is adding some additional #text elements for each newline \n
for this url
...as well as many other RSS I've tried. I checked cnn/bbc feeds, they don't have newlines and dom parser handling them nicely. So I have to add the following before parsing it
var xmlText = htmlText.replace(/\n[ ]*/g, "");
var xmlDoc = parser.parseFromString(xmlText, "text/xml");
Server is returning text/xml.
var channel = xmlDoc.documentElement.childNodes[0];
this returning \n
without my code above and channel
with correction.
What is your question? Do you wish to not use the workaround? I think the workaround is necessary as the parser is working as expected.
Yes, that's what XML parsers are supposed to do by default. Get used to walking through child nodes checking to see whether they're elements (
nodeType===1
) or text nodes (3
).From Firefox 3.5 you get the Element Traversal API, giving you properties like
firstElementChild
andnextElementSibling
. This makes walking over the DOM whilst ignoring whitespace easier. Alternatively you could use XPath (doc.evaluate
) to find the elements you want.If you want to remove whitespace nodes for good, it's a much better idea to do it on the parsed DOM than by using a regex hack:
that is standard behaviour. only IE ignores whithespace between Element Nodes. (XML Whitespace Handling, Whitespace @ MSDN, Whitespace @ MDC)