I'm parsing an XLIFF document using the XDocument class. Does XDocument perform some validation of the content which I read into it, and if so - is there any way to disable that validation?
I'm getting some weird errors if the XLIFF isn't valid XML (I don't care that it isn't, I just want to parse it).
E.g.
'.', hexadecimal value 0x00, is an invalid character.
I'm currently reading the file like this:
string FileLocation = @"C:\XLIFF\text.xlf";
XDocument doc = XDocument.Load(FileLocation);
Thanks.
You can't parse invalid XML, because parsing requires a valid XML structure.
It might be the case that you read the file as ASCII when you should have read it as UTF-8 or UTF-16 and that leads to the problem you encountered.
Possible solution:
Read the file as UTF-8.
XLIFF document is an XML document. Character 0x00 is not a valid XML character. Invalid XML is not an XML so you cannot read it using XML parsers.
Now well-formed is a different thing, you can use SAX parsers to read XML which is not well-formed but not Invalid XML.
Valid characters according to XML Specification:
UPDATE
Suggested solution: Pre-Process the files to remove invalid characters. Character
\0
can be replaced with space unless it has a meaning (is binary) in which case it needs to come in Base64 format.I had similar problem which was fixed by letting StreamReader to read the content.
If that does not help, try to include proper encoding.
If you want to strip characters from strings that are invalid for use in XML, you can use this method:
It removes any characters that fall outside of the set of valid character values, according to the XML standard.