how to escape xml entities in javascript?

2020-01-26 05:12发布

In JavaScript (server side nodejs) I'm writing a program which generates xml as output.

I am building the xml by concatenating a string:

str += '<' + key + '>';
str += value;
str += '</' + key + '>';

The problem is: What if value contains characters like '&', '>' or '<'? What's the best way to escape those characters?

or is there any javascript library around which can escape XML entities?

标签: javascript
9条回答
别忘想泡老子
2楼-- · 2020-01-26 05:19

if something is escaped from before, you could try this since this will not double escape like many others

function escape(text) {
    return String(text).replace(/(['"<>&'])(\w+;)?/g, (match, char, escaped) => {
        if(escaped) 
            return match

        switch(char) {
            case '\'': return '&quot;'
            case '"': return '&apos;'
            case '<': return '&lt;'
            case '>': return '&gt;'
            case '&': return '&amp;'
        }
    })
}
查看更多
霸刀☆藐视天下
3楼-- · 2020-01-26 05:23

HTML encoding is simply replacing &, ", ', < and > chars with their entity equivalents. Order matters, if you don't replace the & chars first, you'll double encode some of the entities:

if (!String.prototype.encodeHTML) {
  String.prototype.encodeHTML = function () {
    return this.replace(/&/g, '&amp;')
               .replace(/</g, '&lt;')
               .replace(/>/g, '&gt;')
               .replace(/"/g, '&quot;')
               .replace(/'/g, '&apos;');
  };
}

As @Johan B.W. de Vries pointed out, this will have issues with the tag names, I would like to clarify that I made the assumption that this was being used for the value only

Conversely if you want to decode HTML entities1, make sure you decode &amp; to & after everything else so that you don't double decode any entities:

if (!String.prototype.decodeHTML) {
  String.prototype.decodeHTML = function () {
    return this.replace(/&apos;/g, "'")
               .replace(/&quot;/g, '"')
               .replace(/&gt;/g, '>')
               .replace(/&lt;/g, '<')
               .replace(/&amp;/g, '&');
  };
}

1 just the basics, not including &copy; to © or other such things


As far as libraries are concerned. Underscore.js (or Lodash if you prefer) provides an _.escape method to perform this functionality.

查看更多
霸刀☆藐视天下
4楼-- · 2020-01-26 05:24

This might be a bit more efficient with the same outcome:

function escapeXml(unsafe) {
    return unsafe.replace(/[<>&'"]/g, function (c) {
        switch (c) {
            case '<': return '&lt;';
            case '>': return '&gt;';
            case '&': return '&amp;';
            case '\'': return '&apos;';
            case '"': return '&quot;';
        }
    });
}
查看更多
孤傲高冷的网名
5楼-- · 2020-01-26 05:26

This is simple:

sText = ("" + sText).split("<").join("&lt;").split(">").join("&gt;").split('"').join("&#34;").split("'").join("&#39;");
查看更多
我只想做你的唯一
6楼-- · 2020-01-26 05:31

If you have jQuery, here's a simple solution:

  String.prototype.htmlEscape = function() {
    return $('<div/>').text(this.toString()).html();
  };

Use it like this:

"<foo&bar>".htmlEscape(); -> "&lt;foo&amp;bar&gt"

查看更多
我欲成王,谁敢阻挡
7楼-- · 2020-01-26 05:34

Technically, &, < and > aren't valid XML entity name characters. If you can't trust the key variable, you should filter them out.

If you want them escaped as HTML entities, you could use something like http://www.strictly-software.com/htmlencode .

查看更多
登录 后发表回答