Formatting string in XML format and remove invalid

2019-09-19 15:42发布

问题:

I've a string say "<Node a="<b>">". I need to escape only the data and parse this string as a node in XMLWriter. Hence how to escape only the attribute value "<" and note the XML structure's "<".

回答1:

using (var writer = XmlWriter.Create(Console.Out))
{
    writer.WriteStartElement("Node");
    writer.WriteAttributeString("a", "<b>");
}

Output <Node a="&lt;b&gt;" />


Firstly you should parse the string. Since this is not valid xml, you can't use an xml parser. You can try HtmlAgilityPack. Then you can write the values with xml writer.

string s = "<Node a=\"<b>\">";

var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(s);

var node = doc.DocumentNode.FirstChild;
var attr = node.Attributes[0];

using (var writer = XmlWriter.Create(Console.Out))
{
    writer.WriteStartElement(node.Name);
    writer.WriteAttributeString(attr.Name, attr.Value);
}