I'm trying to send an XML document over the wire but receiving the following exception:
"MY LONG EMAIL STRING" was specified for the 'Body' element. ---> System.ArgumentException: '', hexadecimal value 0x02, is an invalid character.
at System.Xml.XmlUtf8RawTextWriter.InvalidXmlChar(Int32 ch, Byte* pDst, Boolean entitize)
at System.Xml.XmlUtf8RawTextWriter.WriteElementTextBlock(Char* pSrc, Char* pSrcEnd)
at System.Xml.XmlUtf8RawTextWriter.WriteString(String text)
at System.Xml.XmlUtf8RawTextWriterIndent.WriteString(String text)
at System.Xml.XmlRawWriter.WriteValue(String value)
at System.Xml.XmlWellFormedWriter.WriteValue(String value)
at Microsoft.Exchange.WebServices.Data.EwsServiceXmlWriter.WriteValue(String value, String name)
--- End of inner exception stack trace ---
I don't have any control over what I attempt to send because the string is gathered from an email. How can I encode my string so it's valid XML while keeping the illegal characters?
I'd like to keep the original characters one way or another.
The following solution removes any invalid XML characters, but it does so I think about as performantly as it could be done, and in particular, it does not allocate a new StringBuilder as well as a new string, not unless it is already determined that the string has any invalid characters in it. So the hot spot ends up being just a single for loop on the characters, with the check ending up being often no more than two greater than / lesser than numeric comparisons on each char. If none are found, it simply returns the original string. This is particularly helpful when the vast majority of strings are just fine to start with, it's nice to have these as in and out (with no wasted allocs etc) as quick as possible.
-- update --
See below how one can also directly write an XElement that has these invalid characters, though it uses this code --
Some of this code was influenced by Mr. Tom Bogle's solution here. See also on that same thread the helpful information in the post by superlogical. All of these, however, always instantiate a new StringBuilder and string still.
USAGE:
TEST:
// --- CODE --- (I have these methods in a static utility class called XML)
======== ======== ========
Write XElement.ToString directly
======== ======== ========
First, the usage of this extension method:
-- Fuller test --
--- code ---
-- this uses the following XmlTextWritter --
Another way to remove incorrect XML chars in C# with using XmlConvert.IsXmlChar Method (Available since .NET Framework 4.0)
.Net Fiddle - https://dotnetfiddle.net/v1TNus
For example, the vertical tab symbol (\v) is not valid for XML, it is valid UTF-8, but not valid XML 1.0, and even many libraries (including libxml2) miss it and silently output invalid XML.