Comparing XmlDocument for equality (content wise)

2020-03-09 07:49发布

问题:

If I want to compare the contents of a XMlDocument, is it just like this?

XmlDocument doc1 = GetDoc1();
XmlDocument doc2 = GetDoc2();

if(doc1 == doc2)
{

}

I am not checking if they are both the same object reference, but if the CONTENTS of the xml are the same.

回答1:

No. XmlDocument does not override the behavior of the Equals() method so, it is in fact just performing reference equality - which will fail in your example, unless the documents are actually the same object instance.

If you want to compare the contents (attributes, elements, commments, PIs, etc) of a document you will have to implement that logic yourself. Be warned: it's not trivial.

Depending on your exact scenario, you may be able to remove all non-essential whitespace from the document (which itself can be tricky) and them compare the resulting xml text. This is not perfect - it fails for documents that are semantically identical, but differ in things like how namespaces are used and declared, or whether certain values are escaped or not, the order of elements, and so on. As I said before, XML comparison is not trivial.

You also need to clearly define what it means for two XML documents to be "identical". Does element or attribute ordering matter? Does case (in text nodes) matter? Should you ignore superfluous CDATA sections? Do processing instructions count? What about fully qualified vs. partially qualified namespaces?

In any general purpose implementation, you're likely going to want to transform both documents into some canonical form (be it XML or some other representation) and then compare the canonicalized content.

Tools already exist that perform XML differencing, like Microsoft XML Diff/Patch, you may be able to leverage that to identify differences between two documents. To my knowledge that tool is not distributed in source form ... so to use it in an embedded application you would need to script the process (if you plan to use it, you should first verify that the licensing terms allow it's use and redistribution).

EDIT: Check out @Max Toro's answer if you're using .NET 3.5 SP1, as apparently there's an option in XLinq that may be helpful. Nice to know it exists.



回答2:

Try the DeepEquals method on the XLinq API.

XDocument doc1 = GetDoc1(); 
XDocument doc2 = GetDoc2(); 

if(XNode.DeepEquals(doc1, doc2)) 
{ 

} 

See also Equality Semantics of LINQ to XML Trees



回答3:

A simple way could be to compare OuterXml.

var a = new XmlDocument();
var b = new XmlDocument();

a.LoadXml("<root  foo='bar'  />");
b.LoadXml("<root foo='bar'/>");

Debug.Assert(a.OuterXml == b.OuterXml);


回答4:

LBushkin is right, this is not trivial. Since XML is string data you could technically perform a hash of the contents and compare them, but that will be affected by things like whitespace.

You could perform a structured diff (also called 'XML diffgram') between the two documents and compare the results. This is how .NET datasets keep track of changes, for example.

Other than that you'd have to iterate through the DOM and compare elements, attributes and values to each other. If there's a schema involved then you would also have to take into account positions and so on.



回答5:

Often You want to compare XML strings ordered differently. This can be done easy with this code

class Testing
{
    [Test]
    public void Test()
    {
        Assert.AreEqual(
            "<root><a></a><b></b></root>".SortXml()
            , "<root><b></b><a></a></root>".SortXml());
    }
}

public static class XmlCompareExtension
{
    public static string SortXml(this string @this)
    {
        var xdoc = XDocument.Parse(@this);

        SortXml(xdoc);

        return xdoc.ToString();
    }

    private static void SortXml(XContainer parent)
    {
        var elements = parent.Elements()
            .OrderBy(e => e.Name.LocalName)
            .ToArray();

        Array.ForEach(elements, e => e.Remove());

        foreach (var element in elements)
        {
            parent.Add(element);
            SortXml(element);
        }
    }
}