I'm trying to send an XML document over the wire but receiving the following exception:
"MY LONG EMAIL STRING" was specified for the 'Body' element. ---> System.ArgumentException: '', hexadecimal value 0x02, is an invalid character.
at System.Xml.XmlUtf8RawTextWriter.InvalidXmlChar(Int32 ch, Byte* pDst, Boolean entitize)
at System.Xml.XmlUtf8RawTextWriter.WriteElementTextBlock(Char* pSrc, Char* pSrcEnd)
at System.Xml.XmlUtf8RawTextWriter.WriteString(String text)
at System.Xml.XmlUtf8RawTextWriterIndent.WriteString(String text)
at System.Xml.XmlRawWriter.WriteValue(String value)
at System.Xml.XmlWellFormedWriter.WriteValue(String value)
at Microsoft.Exchange.WebServices.Data.EwsServiceXmlWriter.WriteValue(String value, String name)
--- End of inner exception stack trace ---
I don't have any control over what I attempt to send because the string is gathered from an email. How can I encode my string so it's valid XML while keeping the illegal characters?
I'd like to keep the original characters one way or another.
I'm on the receiving end of @parapurarajkumar's solution, where the illegal characters are being properly loaded into
XmlDocument
, but breakingXmlWriter
when I'm trying to save the output.My Context
I'm looking at exception/error logs from the website using Elmah. Elmah returns the state of the server at the time of the exception, in the form of a large XML document. For our reporting engine I pretty-print the XML with
XmlWriter
.During a website attack, I noticed that some xmls weren't parsing and was receiving this
'.', hexadecimal value 0x00, is an invalid character.
exception.NON-RESOLUTION: I converted the document to a
byte[]
and sanitized it of 0x00, but it found none.When I scanned the xml document, I found the following:
There was the nul byte encoded as an html entity
�
!!!RESOLUTION: To fix the encoding, I replaced the
�
value before loading it into myXmlDocument
, because loading it will create the nul byte and it will be difficult to sanitize it from the object. Here's my entire process:LESSON LEARNED: sanitize for illegal bytes using the associated html entity, if your incoming data is html encoded on entry.
is one way of doing this
Can't the string be cleaned with:
?
Work for me:
The following code removes XML invalid characters from a string and returns a new string without them:
There is a generic solution that works nicely:
Once this is in place, you can then create your override of THIS as follows:
where XmlUtil.RemoveInvalidXmlChars is defined as follows: