I'm in the position to parse XML in .NET. Now I have the choice between at least XmlTextReader
and XDocument
. Are there any comparisons between those two (or any other XML parsers contained in the framework)?
Maybe this could help me to decide without trying both of them in depth.
The XML files are expected to be rather small, speed and memory usage are a minor issue compared to easiness of use. :-)
(I'm going to use them from C# and/or IronPython.)
Thanks!
If you're happy reading everything into memory, use XDocument
. It'll make your life much easier. LINQ to XML is a lovely API.
Use an XmlReader
(such as XmlTextReader
) if you need to handle huge XML files in a streaming fashion, basically. It's a much more painful API, but it allows streaming (i.e. only dealing with data as you need it, so you can go through a huge document and only have a small amount in memory at a time).
There's a hybrid approach, however - if you have a huge document made up of small elements, you can create an XElement
from an XmlReader
positioned at the start of the element, deal with the element using LINQ to XML, then move the XmlReader
onto the next element and start again.
XmlTextReader
is kind of deprecated, do not use it.
From msdn blogs by XmlTeam
Effective Xml Part 1: Choose the right API
Avoid using XmlTextReader
. It contains quite a few bugs that could not be fixed without breaking existing applications already using it.
The world has moved on, have you? Xml APIs you should avoid using.
Obsolete APIs are easy since the compiler helps identifying them but there are two more APIs you should avoid using – namely XmlTextReader
and XmlTextWriter
. We found a number of bugs in these classes which we could not fix without breaking existing applications. The easy route would be to deprecate these classes and ask people to use replacement APIs instead. Unfortunately these two classes cannot be marked as obsolete because they are part of ECMA-335 (Common Language Infrastructure) standard (http://www.ecma-international.org/publications/standards/Ecma-335.htm) – the companion CLILibrary.xml file which is a part of Partition IV).
The good news is that even though these classes are not deprecated there are replacement APIs for these in .NET Framework already and moving to them is relatively easy. First it is necessary to find the places where XmlTextReader
or XmlTextWriter
is being used (unfortunately it is a manual step). Now all the occurrences of XmlTextReader
should be replaced with XmlReader
and all the occurrences of XmlTextWriter
should be replaced with XmlWriter
(note that XmlTextReader
derives from XmlReader
and XmlTextWriter
derives from XmlWriter
so the app can already be using these e.g. as formal parameters). The last step is to change the way the XmlReader
/XmlWriter
objects are instantiated – instead of creating the reader/writer directly it is necessary to the static factory method .Create()
present on both XmlReader
and XmlWriter
APIs.
Furthermore, intellisense in Visual Studio doesn't list XmlTextReader
under System.Xml namespace. The class is defined as:
[EditorBrowsable(EditorBrowsableState.Never)]
public class XmlTextReader : XmlReader, IXmlLineInfo, IXmlNamespaceResolver
The XmlReader.Create
factory methods return other internal implementations of the abstract class XmlReader
depending on the settings passed.
For forward-only streaming API (i.e. that doesn't load the entire thing into memory), use XmlReader via XmlReader.Create
method.
For an easier API to work with, go for XDocument aka LINQ To XML. Find XDocument
vs XmlDocument
here and here.