I'm looking for a way to import and export a list of changes to an XML data document (irregular structure; not naturally fitting a DataSet).
If I had a regular structure I would use a DataTable, and I could evaluate which records have been edited and then commit or cancel the changes, and I could also transmit a packet of the required changes.
How do I do this with XML data?
If a good answer isn't available I'm thinking my best bet would be to use a DataTable with the scheme [XPath, Value] despite the inefficient storage, and navigation difficulties.
I expect to make changes to the document (with XPath or LINQ or data-bound controls or whatever), then remember the changes and send only the changes over TCP.
Then I want to receive back another change list and apply it to the XML document. I don't want to send the entire document both for size and because I need to know and evaluate the changes being sent.
(Just to clarify: My program needs to send and receive document changes. The other end of the pipe is not based in .net, and is not part of this question.)
I haven't found any useable answers anywhere. It seems back in 2003 MS was talking about creating an XPathDocument2 or something that implemented what I'm asking for (books talking about the coming release mention it), but it doesn't seem to have been carried out. So here's my attempt at a solution:
Use XPathDocument/XPathNavigator, and add event handlers for Change/Delete/Insert. For each of these events, put a record in a DataTable {XPath | OldValue | NewValue} indicating the change. When ready to Commit, send the table across then clear it. If instead cancelling, use the Table info to undo the changes in the XPathDocument.
I haven't implemented this yet, but it seems like it might serve.
When you get XML data with irregular structure; not naturally fitting a DataSet and you want an Object Model to easily work with the data. You can use the XML Schema Definition Tool (Xsd.exe) with the /classes option to generate C# or VB.Net classes from an XML file.
The XSD.exe lives in :
You run xsd.exe from the Visual Studio Command Line.
-Start
-All Programs
-Visual Studio
-Tools
-Command Line
This is the command to view all the XSD command line parameters:
To convert an irregular XML file (XmlResponseObject.xml) into Classes:
This will generate a csharp file with classes that represent the XML. You may want to refeactor it out into separate class files being careful about duplicate classes in the single file that are disambiguate by namespace. Either way the classes wont be the nicest looking with all the xml attributes but the good part is you can bind to them via XML. This is an example where I retrive XML via a REST webservice, xmlResponseObject is the ObjectModel of classes that fits the XML.
Given you wish to only send and receive document changes you could modify the classes with IsDirty flags. I'm sure though once you have the classes to work with, it will be dead easy to detect diff's.
To load any XML data into
DataSet
, you have to provide corresponding schema.See Deriving DataSet Relational Structure from XML Schema (XSD).
Besides,
DataSet
/DataTable
doesn't work with XML documents. They can import data from, and export data to XML.The problem you have here is that XML is just a form of representing data, its not necessarily the data itself. Is this some sort of XML editor you are using, or is XML just the transport?
If you are talking about xml as a transport then when you talk about sending XML changes descriptions, you probably want to be generating those change descriptions at the point you generate the change itself, and there is every chance that the change descriptions won't be in the same schema that the original data is.
In addition the reason that datasets can do this, is because each row in a dataset has a known unique key. So the change can be sent back for the row instead of the entire set. XML doesn't work like that, each row doesn't have a unique key. XPath can be used as the change locator but that could be more inefficient than sending the entire document with enough edits.
Why not simply treat the XML as text as use anyone of the standard patching algorithms? (look at the source for Git or Hg)
I have tried to find a free or open-source XML diff tool numerous times before, but never dug up anything that really fit the bill. Essentially, you're looking at tree diffing, which is a whole discpline on its own. The fact that you're using XML is subordinate to this, I guess, as it's nothing but a tree in another form. You "just" need to define what specifies a node.
Though the Decomposition Algorithm for Tree Edit Distance calculates the distance between 2 trees, I suspect you can transform it to give you all changes, as that's the base for the distance measurement. How you communicate the changes after detection, is completely up to you. That could range from XML to JSON. Note that the authors of the algorithm mention they created a Python version in a few dozens of lines, so maybe if you drop the a line, they can be of assistance.
It looks like you could be the first one to publish a practical proof of concept if you can get this done :)
if you used XmlDocument events such as NodeInserted, NodeDeleted, NodeChanged you could build a list of such changes and then execute them on another copy. If total amount of changes is longer than document itself you could send document instead. Zipping xml data also helps.
other than that I do not see any other easy approach.