I get this error. Later I searched and found out the reason of illegal characters in my XML and its solution. But I don't have the access to edit any of these files. My job is to read and fetch the tag value, attribute value and similar stuff. SO I can't replace the binary characters with escapes like '\x01' with . Also I tried to include CheckCharacters =false in XMLreader settings. It doesn't take this. Still it is throwing the same error.
Is it not possible to fix in XMLreader? I read about XMLtextReader. It can skip the exception. But already I have coded for all my features using XMLreader. It would be good if I can find a solution for this. Otherwise I would have to change all my code.
My code:
private void button1_Click(object sender, EventArgs e)
{
int i = 0;
var filenames = System.IO.Directory
.EnumerateFiles(textBox1.Text, "*.xml", System.IO.SearchOption.AllDirectories)
.Select(System.IO.Path.GetFullPath);
foreach (var f in filenames)
{
var resolver = new XmlUrlOverrideResolver();
resolver.DtdFileMap[@"X1.DTD"] = @"\\location\X1.DTD";
resolver.DtdFileMap[@"R2.DTD"] = @"\\location\X2.DTD";
resolver.DtdFileMap[@"R5.DTD"] = @"\\location\R5.DTD";
XmlReaderSettings settings = new XmlReaderSettings();
settings.DtdProcessing = DtdProcessing.Parse;
settings.XmlResolver = resolver;
XmlReader doc = XmlReader.Create(f, settings);
while (doc.Read())
{
if ((doc.NodeType == XmlNodeType.Element) && (doc.Name == "ap"))
{
if (doc.HasAttributes)
{
String fin = doc.GetAttribute("ap");
if (fin == "no")
{
String[] array = new String[10000];
array[i] = (f);
File.AppendAllText(@"\\location\NAPP.txt", array[i] + Environment.NewLine);
i++;
}
else
{
String[] abs = new String[10000];
abs[i] = (f);
File.AppendAllText(@"\\location\APP.txt", abs[i] + Environment.NewLine);
i++;
}
}
}
}
}
MessageBox.Show("Done");
}
This is a very simple example of character "filter" that will replae the 0x06 character with a space:
You use it this way:
Note that it's very simple because it's replacing a character (the 0x06) with another character of the same "length" (the space). If you wanted to replace a character with a "sequence" of characters (to escape it), it would get more complex (not impossible, 30 minutes of work difficult)
(I have checked and it seems the
XmlTextReader
only uses that method and not theRead()
method)As always, when a programmer tells you 30 minutes, it means 0 minutes or 2 hours :-)
This is the "more complex"
ReplacingStreamReader
:Use it like:
Be aware that the
ReplacingStreamReader
doesn't "know" which part of the xml it is modifying, so rarely a "blind" replace is ok :-) Other than this limitation, you can replace any character with any string (null
in theReplaceWith
means "keep the current character", equivalent tox.ToString()
in the example given. Returningstring.Empty
is valid, means remove the current character).The class is quite interesting: it keeps a
char[] RemainingChars
with the chars that have been read (and filtered byReplaceWith
) but that haven't been returned by aRead()
method because the passed buffer was too much small (theReplaceWith
method could "enlarge" the read string, making it too much big for thebuffer
!). Note thatsb
is aList<char>
instead of aStringBuilder
. Probably using one or the other would be nearly equivalent, code-wise.You could first read the content into a
string
replace (escape) the content, and then load it into aXmlReader
: