string escape into XML-Attribute

2019-04-06 01:23发布

问题:

I had a look at string escape into XML and found it very useful.

I would like to do a similar thing: Escape a string to be used in an XML-Attribute.

The string may contain \r\n. The XmlWriter class produces something like \r\n -> 


The solution I'm currently using includes the XmlWriter and a StringBuilder and is rather ugly.

Any hints?

Edit1:
Sorry to disappoint LarsH, buy my first approach was

public static string XmlEscapeAttribute(string unescaped)
{
    XmlDocument doc = new XmlDocument();
    XmlAttribute attr= doc.CreateAttribute("attr");
    attr.InnerText = unescaped;
    return attr.InnerXml;
}

It does not work. XmlEscapeAttribute("Foo\r\nBar") will result in "Foo\r\nBar"

I used the .NET Reflector, to find out how the XmlTextWriter escapes Attributes. It uses the XmlTextEncoder class which is internal...

My method I'm currently usig lokks like this:

public static string XmlEscapeAttribute(string unescaped)
{
    if (String.IsNullOrEmpty(unescaped)) return unescaped;

    XmlWriterSettings settings = new XmlWriterSettings();
    settings.OmitXmlDeclaration = true;
    StringBuilder sb = new StringBuilder();
    XmlWriter writer = XmlWriter.Create(sb, settings);

    writer.WriteStartElement("a");
    writer.WriteAttributeString("a", unescaped);
    writer.WriteEndElement();
    writer.Flush();
    sb.Length -= "\" />".Length;
    sb.Remove(0, "<a a=\"".Length);

    return sb.ToString();
}

It's ugly and probably slow, but it does work: XmlEscapeAttribute("Foo\r\nBar") will result in "Foo&#xD;&#xA;Bar"

Edit2:

SecurityElement.Escape(unescaped);

does not work either.

Edit3 (final):

Using all the very useful comments from Lars, my final implementation looks like this:

Note: the .Replace("\r", "&#xD;").Replace("\n", "&#xA;"); is not required for valid XMl. It is a cosmetic measure only!

    public static string XmlEscapeAttribute(string unescaped)
    {

        XmlDocument doc = new XmlDocument();
        XmlAttribute attr= doc.CreateAttribute("attr");
        attr.InnerText = unescaped;
        // The Replace is *not* required!
        return attr.InnerXml.Replace("\r", "&#xD;").Replace("\n", "&#xA;");
    }

As it turns out this is valid XML and will be parsed by any standard compliant XMl-parser:

<response message="Thank you,
LarsH!" />

回答1:

Modifying the solution you referenced, how about

public static string XmlEscape(string unescaped)
{
    XmlDocument doc = new XmlDocument();
    var node = doc.CreateAttribute("foo");
    node.InnerText = unescaped;
    return node.InnerXml;
}

All I did was change CreateElement() to CreateAttribute(). The attribute node type does have InnerText and InnerXml properties.

I don't have the environment to test this in, but I'd be curious to know if it works.

Update: Or more simply, use SecurityElement.Escape() as suggested in another answer to the question you linked to. This will escape quotation marks, so it's suitable for using for attribute text.

Update 2: Please note that carriage returns and line feeds do not need to be escaped in an attribute value, in order for the XML to be well-formed. If you want them to be escaped for other reasons, you can do it using String.replace(), e.g.

SecurityElement.Escape(unescaped).Replace("\r", "&#xD;").Replace("\n", "&#xA;");

or

return node.InnerXml.Replace("\r", "&#xD;").Replace("\n", "&#xA;");


回答2:

if it can be of any help, in several language, one uses createCDATASection to avoid all XML special characters.

It adds something like this :

<tag><![CDATA[ <somecontent/> ]]></tag>