Programmatically Diff/Merge Xml Documents

2019-08-29 01:42发布

First, let me begin by telling you the details on the problem I'm trying to solve.

We have a third party application that uses Xml Documents to store all of it's business logic and look up tables and such. The application has a base set of Xml Files, and uses a kind of inheritance model to expose inherited Xml files that we're to edit to customize the business logic. I say "Kind of" due to the horrible implementation of inheritance it uses.

Currently there are over 3000 seperate Xml files ranging from 1k to 5000k and totaling about 600MB in size. The only good thing so far, is that they all use the same Xsd.

Our problem is, we receive monthly updates to the core Xml files, and we're supposed to put them in place, and upgrade our custom documents to line up with the new version of the base documents. We're currently doing this manually, using DiffDog, and piecing together the documents to create new ones, but I'm trying to wrap my head around the possibility of doing this programmatically. Let me see if I can kind of visualize this for you:

We start off with a structure kind of like this below, with the base template in place, and a custom template that we can define our custom rules in (Which we do a lot)

..\LineOfBusiness\BaseTemplates\BaseXml_1_0_0_0.xml
..\LineOfBusiness\CustomTemplates\Document_1_0_0_0.xml

We're then given an upgrade each month so now we have a structure like this:

..\LineOfBusiness\BaseTemplates\BaseXml_1_0_0_0.xml
..\LineOfBusiness\BaseTemplates\BaseXml_1_1_0_0.xml
..\LineOfBusiness\CustomTemplates\Document_1_0_0_0.xml

Our job essentially is to create the

..\LineOfBusiness\CustomTemplates\Document_1_1_0_0.xml

document ourselves every month, bringing the changes we made in the previous version, into the new versions logic.

I know this system is ridiculous, but I can't change that today. Any ideas on how to tackle this problem would be great. I can tell you what I've thought of so far...

  1. Deserialize the Base and Custom old version documents to get a list of specific differences, the apply those differences to a deserialized version of the new Base and apply the differences to it, then reserialize to xml.

  2. Apply some sort of annotation process to the Custom Templates, so that we can extract the differences programmatically at upgrade time.

  3. Outsource the upgrade process...

1条回答
做个烂人
2楼-- · 2019-08-29 02:26

If your using a .NET language, you might be able to accomplish what your trying to do with Microsoft's XML Diff and Patch tool/library.

I've used it to correctly identify that there were changes between different xml fragments. This was important for our scenario as the XML we had on disk would differ after being stored in a Sql Server XML column because of insignificant whitespace being removed, and/or re-arranging attributes (Infoset). Just comparing the text blobs would always detect a difference, when actually the XML elements/values were the same.

I've not used the patching ability of the tool, only XmlDiff.

There are several nice commercial XML diff tools on the market, but I don't know of any that provide a code, or scripting, API. That would be a nice feature for value add!

查看更多
登录 后发表回答