XDocument prevent invalid characters

2019-05-24 04:01发布

问题:

I am using XDocument to keep a sort of database. This database consists of registered chatterbots, and I simply have many "bot" nodes with attributes such as "username", "owner", and such. However, occasionally some smart guy decides to make a bot with a very strange character as one of the properties. This makes the XDocument class series throw an exception whenever that node is read, a very large problem because the database fails to save completely as it stops writing to the file as soon as it hits the invalid character.

My question is this- Is there a simple method that is something like XSomething.IsValidString(string s), so I can just omit the offending data? My database is not the official one, just a personal use, so it is not imperative that I include the bad data.

Some code that I am using (the variable file is the XDocument):
To save:
file.Save(Path.Combine(Environment.CurrentDirectory, "bots.xml"));

To load (after checking if File.Exists() etc etc):
file = XDocument.Load(Path.Combine(Environment.CurrentDirectory, "bots.xml"));

To add to the database (variables are all strings):

            file.Root.Add(new XElement("bot",
                new XAttribute("username", botusername),
                new XAttribute("type", type),
                new XAttribute("botversion", botversion),
                new XAttribute("bdsversion", bdsversion),
                new XAttribute("owner", owner),
                new XAttribute("trigger", trigger)));

Pardon my lack of proper XML techniques, I'm just starting. What I'm asking is if there is a XSomething.IsValidString(string s) method, not how terrible my XML is.

Ok, I just got the exception again, here is the exact message and stack trace.

System.ArgumentException: '', hexadecimal value 0x07, is an invalid character.
at System.Xml.XmlUtf8RawTextWriter.InvalidXmlChar(Int32 ch, Byte* pDst, Boolean entitize)
at System.Xml.XmlUtf8RawTextWriter.WriteAttributeTextBlock(Char* pSrc, Char* pSrcEnd)
at System.Xml.XmlUtf8RawTextWriter.WriteString(String text)
at System.Xml.XmlUtf8RawTextWriterIndent.WriteString(String text)
at System.Xml.XmlWellFormedWriter.WriteString(String text)
at System.Xml.XmlWriter.WriteAttributeString(String prefix, String localName, String ns, String value)
at System.Xml.Linq.ElementWriter.WriteStartElement(XElement e)
at System.Xml.Linq.ElementWriter.WriteElement(XElement e)
at System.Xml.Linq.XElement.WriteTo(XmlWriter writer)
at System.Xml.Linq.XContainer.WriteContentTo(XmlWriter writer)
at System.Xml.Linq.XDocument.WriteTo(XmlWriter writer)
at System.Xml.Linq.XDocument.Save(String fileName, SaveOptions options)
at System.Xml.Linq.XDocument.Save(String fileName)
at /* my code stack trace omitted */

回答1:

Try changing the file.Save line for the following code:

XmlWriterSettings settings = new XmlWriterSettings();
settings.CheckCharacters = false;
XmlWriter writer = XmlWriter.Create(Path.Combine(Environment.CurrentDirectory, "bots.xml"), settings);
file.Save(writer);

source: http://sartorialsolutions.wordpress.com/page/2/



回答2:

First can you check whether your XML file is saved with proper encoding? I normally save xml file as UTF8 and You can declare encoding in your xml header

<?xml version="1.0" encoding="UTF-8"?>

Of course the body of your xml must conforming xml standard. Here is a good article about it

http://weblogs.sqlteam.com/mladenp/archive/2008/10/21/Different-ways-how-to-escape-an-XML-string-in-C.aspx



回答3:

From .NET 4, you can use XmlConvert.VerifyXmlChars(string content). This will throw an exception if the string passed is not accepted.