Merging of xml documents

2019-06-15 14:38发布

问题:

All of the solutions I have come across regarding merging XML documents do not accomplish what I desire. Let me explain:

XML Document 1:

<?xml version="1.0" encoding="utf-8" ?>
<a>
    <b title="Original Section">
        <b title="Original Child Section"></b>
        <b title="Original Child Section 2"></b>
    </b>
</a>

XML Document 2:

<?xml version="1.0" encoding="utf-8" ?>
<a>
    <b title="New Section">
        <b title="New Child Section"></b>
    </b>
    <b title="Original Section">
        <b title="Original Child Section">
            <b title="New Child For Old Section"></b>
        </b>
    </b>    
</a>

Into a final doc like this:

<?xml version="1.0" encoding="utf-8" ?>
<a>
    <b title="Original Section">
        <b title="Original Child Section">
            <b title="New Child For Old Section"></b>
        </b>
        <b title="Original Child Section 2"></b>
    </b>    
    <b title="New Section">
        <b title="New Child Section"></b>
    </b>
</a>

The documents are similar in content, but can have an arbitrary number of child nodes. I also would like to eliminate duplicates. I consider duplicates being elements with the same attributes (based on attribute name and value). Has anyone seen a working example of this implementation? I can envision how I would write it using some loops and a bit of recursion, but to me, that just doesn't seem like the best way to accomplish what I want :)

Cheers and thanks in advance!

* EDIT *

Since the consensus is that loops and recursion are a must, what would be the most elegant and efficient way to accomplish this? I suppose another fundamental question to this problem is what is the best way to compare the nodes as you iterate?

回答1:

Eventually any solution to this problem will boil down to loops and/or recursion. You're talking basic set theory, and linq may be useful for distilling the process, but it will ultimately be iterating over both sets and merging the results.



回答2:

I'd write an IEqualityComparer that specifies when two nodes are a 'match' - i.e. sets the title matching rule.

class XElementComparer : IEqualityComparer<XElement>
{
    public bool Equals(XElement x, XElement y)
    {
        var xTitle = x.Attribute("title");
        var yTitle = y.Attribute("title");

        if (xTitle == null || yTitle == null) return false;

        return xTitle.Value == yTitle.Value;
    }

    public int GetHashCode(XElement obj)
    {
        return base.GetHashCode();
    }
}

And then write a recursive method to trawl through your XML, merging nodes that match according to the comparer.

private XElement Merge(XElement node1, XElement node2)
{
    // trivial cases
    if (node1 == null) return node2;
    if (node2 == null) return node1;

    var elements1 = node1.Elements();
    var elements2 = node2.Elements();

    // create a merged root
    var result = new XElement(node1.Name, node1.Attribute("title")); 

    var comparer = new XElementComparer();
    var mergedNodes = elements1.Union(elements2, comparer).ToList();

    // for the union of the elements, insert their merge values
    foreach (var title in mergedNodes)
    {
        var child1 = elements1.SingleOrDefault(e => comparer.Equals(e, title));
        var child2 = elements2.SingleOrDefault(e => comparer.Equals(e, title));

        result.Add(Merge(child1, child2));
    }

    return result;
}