I have an application which saves documents (think word documents) in an Xml based format - Currently C# classes generated from xsd files are used for reading / writing the document format and all was well until recently when I had to make a change the format of the document. My concern is with backwards compatability as future versions of my application need to be able to read documents saved by all previous versions and ideally I also want older versions of my app to be able to gracefully handle reading documents saved by future versions of my app.
For example, supposing I change the schema of my document to add an (optional) extra element somewhere, then older versions of my application will simply ignore the extra elemnt and there will be no problems:
<doc>
<!-- Existing document -->
<myElement>Hello World!</myElement>
</doc>
However if a breaking change is made (an attribute is changed into an element for example, or a collection of elements), then past versions of my app should either ignore this element if it is optional, or inform the user that they are attempting to read a document saved with a newer version of my app otherwise. Also this is currently causing me headaches as all future versions of my app need entirely separate code is needed for reading the two different documents.
An example of such a change would be the following xml:
<doc>
<!-- Existing document -->
<someElement contents="12" />
</doc>
Changing to:
<doc>
<!-- Existing document -->
<someElement>
<contents>12</contents>
<contents>13</contents>
</someElement>
</doc>
In order to prevent support headaches in the future I wanted to come up with a decent strategy for handling changes I might make in the future, so that versions of my app that I release now are going to be able to cope with these changes in the future:
- Should the "version number" of the document be stored in the document itself, and if so what versioning strategy should be used? Should the document version match the .exe assembly version, or should a more complex strategy be used, (for example major revision changed indicate breaking changes, wheras minor revision increments indicate non-breaking changes - for example extra optional elements)
- What method should I use to read the document itself and how do I avoid replicating massive amounts of code for different versions of documents?
- Although XPath is obviously most flexible, it is a lot more work to implement than simply generating classes with xsd.
- On the other hand if DOM parsing is used then a new copy of the document xsd would be needed in source control for each breaking change, causing problems if fixes ever need to be applied to older schemas (old versions of the app are still supported).
Also, I've worked all of this very loosly on the assumption that all changes I make can be split into these two categories of "beaking changes" and "nonbreaking changes", but I'm not entirely convinced that this is a safe assumption to make.
Note that I use the term "document" very loosely - the contents dont resemble a document at all!
Thanks for any advice you can offer me.
You definitely need a version number in the XML file, and I would suggest not tying it to the version of the application because it's really a separate entity. You may through two or three versions of your app without ever changing the XML format or you may wind up changing the format multiple times during development of a single release.
If you want older versions of the application to be able to read newer versions of the XML file then you can never, ever remove elements or change their names. You can always add elements and the older code will happily ignore them (one of the nice features of XML) but if you remove them then the old code won't be able to function.
Like Ishmael said, XSLT is a good way to convert the XML format from one version to another so that you don't wind up with a whole pile of parsing routines in your source code.
XSLT is an obvious choice here. Given that you can identify the version of your document, for each version of your schema, creat an XSLT that transforms the previous version to your new version.
You can apply the transforms in sequence until you reach the current version. Thus you are only ever editing the latest document version. Of course, you will be unable to save to the old format and can break the document for older versions, but this is typical of many applications. If you absolutely need to save to the old version, just create a transform that goes the other way.
Like @Andy says, use the major build number of your app.
Could you add an attribute to the root element specifying version?
That way older versions wont be broken, and newer versions of your software will see the attribute and switch to a different loading method appropriately.
Version numbering itself would depend on your frequency of release. I would personally go with the major build number from your software, unless you foresee the format changing more often than that.
Edit: just noticed the bit about code duplication:
For that i would use the Factory Pattern, something like this: