Firefox DOMParser problem

2019-08-12 11:50发布

For some reason DOMParser is adding some additional #text elements for each newline \n for this url

http://rt.com/Root.rss

...as well as many other RSS I've tried. I checked cnn/bbc feeds, they don't have newlines and dom parser handling them nicely. So I have to add the following before parsing it

var xmlText = htmlText.replace(/\n[ ]*/g, "");
var xmlDoc = parser.parseFromString(xmlText, "text/xml");

Server is returning text/xml.

var channel = xmlDoc.documentElement.childNodes[0];

this returning \n without my code above and channel with correction.

3条回答
淡お忘
2楼-- · 2019-08-12 12:34

What is your question? Do you wish to not use the workaround? I think the workaround is necessary as the parser is working as expected.

查看更多
Evening l夕情丶
3楼-- · 2019-08-12 12:40

Yes, that's what XML parsers are supposed to do by default. Get used to walking through child nodes checking to see whether they're elements (nodeType===1) or text nodes (3).

From Firefox 3.5 you get the Element Traversal API, giving you properties like firstElementChild and nextElementSibling. This makes walking over the DOM whilst ignoring whitespace easier. Alternatively you could use XPath (doc.evaluate) to find the elements you want.

If you want to remove whitespace nodes for good, it's a much better idea to do it on the parsed DOM than by using a regex hack:

function removeWhitespace(node) {
    for (var i= node.childNodes.length; i-->0;) {
        var child= node.childNodes[i];
        if (child.nodeType===3 && child.data.match(/^\s*$/))
            node.removeChild(child);
        if (child.nodeType===1)
            removeWhitespace(child);
    }
}
查看更多
孤傲高冷的网名
4楼-- · 2019-08-12 12:52

For some reason DOMParser is adding some additional #text elements for each newline \n for this url

that is standard behaviour. only IE ignores whithespace between Element Nodes. (XML Whitespace Handling, Whitespace @ MSDN, Whitespace @ MDC)

查看更多
登录 后发表回答