I am trying to write a simple algorithm to read two XML files with the exact same nodes and structure but not necessarily the same data inside the child nodes and not the same order. How could I create a simple implementation for creating a third, temporary XML being the differential between the two first ones, using Microsoft's XML Diff .DLL ?
XML Diff on MSDN:
XML Diff and Patch Tool
XML Diff and Patch GUI Tool
sample XML code of the two different XML files to compare:
<?xml version="1.0" encoding="utf-8" ?>
<Stats Date="2011-01-01">
<Player Rank="1">
<Name>Sidney Crosby</Name>
<Team>PIT</Team>
<Pos>C</Pos>
<GP>39</GP>
<G>32</G>
<A>33</A>
<PlusMinus>20</PlusMinus>
<PIM>29</PIM>
</Player>
</Stats>
<?xml version="1.0" encoding="utf-8" ?>
<Stats Date="2011-01-10">
<Player Rank="1">
<Name>Sidney Crosby</Name>
<Team>PIT</Team>
<Pos>C</Pos>
<GP>42</GP>
<G>35</G>
<A>34</A>
<PlusMinus>22</PlusMinus>
<PIM>30</PIM>
</Player>
</Stats>
Result wanted (difference between the two)
<?xml version="1.0" encoding="utf-8" ?>
<Stats Date="2011-01-10">
<Player Rank="1">
<Name>Sidney Crosby</Name>
<Team>PIT</Team>
<Pos>C</Pos>
<GP>3</GP>
<G>3</G>
<A>1</A>
<PlusMinus>2</PlusMinus>
<PIM>1</PIM>
</Player>
</Stats>
In this case, I would probably use XSLT to convert the resulting XML "differential" file into a sorted HTML file, but I am not there yet. All I want to do is to display in the third XML file the difference of every numerical value of each nodes, starting from the "GP" child-node.
C# code I have so far:
private void CompareXml(string file1, string file2)
{
XmlReader reader1 = XmlReader.Create(new StringReader(file1));
XmlReader reader2 = XmlReader.Create(new StringReader(file2));
string diffFile = StatsFile.XmlDiffFilename;
StringBuilder differenceStringBuilder = new StringBuilder();
FileStream fs = new FileStream(diffFile, FileMode.Create);
XmlWriter diffGramWriter = XmlWriter.Create(fs);
XmlDiff xmldiff = new XmlDiff(XmlDiffOptions.IgnoreChildOrder |
XmlDiffOptions.IgnoreNamespaces |
XmlDiffOptions.IgnorePrefixes);
bool bIdentical = xmldiff.Compare(file1, file2, false, diffGramWriter);
diffGramWriter.Close();
// cleaning up after we are done with the xml diff file
File.Delete(diffFile);
}
That's what I have so far, but the results is garbage... note that for each "Player" node, the first three childs have NOT to be compared... How can I implement this?
There are two immediate solutions:
Solution 1.
You can first apply a simple transform to the two documents that will delete the elements that should not be compared. Then, compare the results ing two documents -- exactly with your current code. Here is the transformation:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="Name|Team|Pos"/>
</xsl:stylesheet>
When this transformation is applied to the provided XML document:
<Stats Date="2011-01-01">
<Player Rank="1">
<Name>Sidney Crosby</Name>
<Team>PIT</Team>
<Pos>C</Pos>
<GP>39</GP>
<G>32</G>
<A>33</A>
<PlusMinus>20</PlusMinus>
<PIM>29</PIM>
<PP>10</PP>
<SH>1</SH>
<GW>3</GW>
<Shots>0</Shots>
<ShotPctg>154</ShotPctg>
<TOIPerGame>20.8</TOIPerGame>
<ShiftsPerGame>21:54</ShiftsPerGame>
<FOWinPctg>22.6</FOWinPctg>
</Player>
</Stats>
the wanted resulting document is produced:
<Stats Date="2011-01-01">
<Player Rank="1">
<GP>39</GP>
<G>32</G>
<A>33</A>
<PlusMinus>20</PlusMinus>
<PIM>29</PIM>
<PP>10</PP>
<SH>1</SH>
<GW>3</GW>
<Shots>0</Shots>
<ShotPctg>154</ShotPctg>
<TOIPerGame>20.8</TOIPerGame>
<ShiftsPerGame>21:54</ShiftsPerGame>
<FOWinPctg>22.6</FOWinPctg>
</Player>
</Stats>
Solution 2.
This is a complete XSLT 1.0 solution (for convenience only, the second XML document is embedded in the transformation code):
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:variable name="vrtfDoc2">
<Stats Date="2011-01-01">
<Player Rank="2">
<Name>John Smith</Name>
<Team>NY</Team>
<Pos>D</Pos>
<GP>38</GP>
<G>32</G>
<A>33</A>
<PlusMinus>15</PlusMinus>
<PIM>29</PIM>
<PP>10</PP>
<SH>1</SH>
<GW>4</GW>
<Shots>0</Shots>
<ShotPctg>158</ShotPctg>
<TOIPerGame>20.8</TOIPerGame>
<ShiftsPerGame>21:54</ShiftsPerGame>
<FOWinPctg>22.6</FOWinPctg>
</Player>
</Stats>
</xsl:variable>
<xsl:variable name="vDoc2" select=
"document('')/*/xsl:variable[@name='vrtfDoc2']/*"/>
<xsl:template match="node()|@*" name="identity">
<xsl:param name="pDoc2"/>
<xsl:copy>
<xsl:apply-templates select="node()|@*">
<xsl:with-param name="pDoc2" select="$pDoc2"/>
</xsl:apply-templates>
</xsl:copy>
</xsl:template>
<xsl:template match="/">
<xsl:apply-templates select="*">
<xsl:with-param name="pDoc2" select="$vDoc2"/>
</xsl:apply-templates>
-----------------------
<xsl:apply-templates select="$vDoc2">
<xsl:with-param name="pDoc2" select="/*"/>
</xsl:apply-templates>
</xsl:template>
<xsl:template match="Player/*">
<xsl:param name="pDoc2"/>
<xsl:if test=
"not(. = $pDoc2/*/*[name()=name(current())])">
<xsl:call-template name="identity"/>
</xsl:if>
</xsl:template>
<xsl:template match="Name|Team|Pos" priority="20"/>
</xsl:stylesheet>
when this transformation is applied on the same first document as above, the correct diffgrams are produced:
<Stats Date="2011-01-01">
<Player Rank="1">
<GP>39</GP>
<PlusMinus>20</PlusMinus>
<GW>3</GW>
<ShotPctg>154</ShotPctg>
</Player>
</Stats>
-----------------------
<Stats xmlns:xsl="http://www.w3.org/1999/XSL/Transform" Date="2011-01-01">
<Player Rank="2">
<GP>38</GP>
<PlusMinus>15</PlusMinus>
<GW>4</GW>
<ShotPctg>158</ShotPctg>
</Player>
</Stats>
How this works:
The transformation is applied on the first document, passing the second document as parameter.
This produces an XML document whose only leaf element nodes are the ones that have different value than the corresponding leaf element nodes in the second document.
The same processing is performed as in 1. above, but this time on the second document, passing the first document as parameter.
This produces a second diffgram: an XML document whose only leaf element nodes are the ones that have different value** than the corresponding leaf element nodes in the first document
Okay... I finally opted with a pure C# solution to compare the two XML files, without using the XML Diff/Patch .dll and without even needing to use XSL transforms. I will be needing XSL transforms in the next step though, to convert the Xml into HTML for viewing purposes, but I have figured an algorithm using nothing but System.Xml and System.Xml.XPath.
Here is my algorithm:
private void CompareXml(string file1, string file2)
{
// Load the documents
XmlDocument docXml1 = new XmlDocument();
docXml1.Load(file1);
XmlDocument docXml2 = new XmlDocument();
docXml2.Load(file2);
// Get a list of all player nodes
XmlNodeList nodes1 = docXml1.SelectNodes("/Stats/Player");
XmlNodeList nodes2 = docXml2.SelectNodes("/Stats/Player");
// Define a single node
XmlNode node1;
XmlNode node2;
// Get the root Xml element
XmlElement root1 = docXml1.DocumentElement;
XmlElement root2 = docXml2.DocumentElement;
// Get a list of all player names
XmlNodeList nameList1 = root1.GetElementsByTagName("Name");
XmlNodeList nameList2 = root2.GetElementsByTagName("Name");
// Get a list of all teams
XmlNodeList teamList1 = root1.GetElementsByTagName("Team");
XmlNodeList teamList2 = root2.GetElementsByTagName("Team");
// Create an XmlWriterSettings object with the correct options.
XmlWriter writer = null;
XmlWriterSettings settings = new XmlWriterSettings();
settings.Indent = true;
settings.IndentChars = (" ");
settings.OmitXmlDeclaration = false;
// Create the XmlWriter object and write some content.
writer = XmlWriter.Create(StatsFile.XmlDiffFilename, settings);
writer.WriteStartElement("StatsDiff");
// The compare algorithm
bool match = false;
int j = 0;
try
{
// the list has 500 players
for (int i = 0; i < 500; i++)
{
while (j < 500 && match == false)
{
// There is a match if the player name and team are the same in both lists
if (nameList1.Item(i).InnerText == nameList2.Item(j).InnerText)
{
if (teamList1.Item(i).InnerText == teamList2.Item(j).InnerText)
{
match = true;
node1 = nodes1.Item(i);
node2 = nodes2.Item(j);
// Call to the calculator and Xml writer
this.CalculateDifferential(node1, node2, writer);
j = 0;
}
}
else
{
j++;
}
}
match = false;
}
// end Xml document
writer.WriteEndElement();
writer.Flush();
}
finally
{
if (writer != null)
writer.Close();
}
}
XML Results:
<?xml version="1.0" encoding="utf-8"?>
<StatsDiff>
<Player Rank="1">
<Name>Sidney Crosby</Name>
<Team>PIT</Team>
<Pos>C</Pos>
<GP>0</GP>
<G>0</G>
<A>0</A>
<Points>0</Points>
<PlusMinus>0</PlusMinus>
<PIM>0</PIM>
<PP>0</PP>
<SH>0</SH>
<GW>0</GW>
<OT>0</OT>
<Shots>0</Shots>
<ShotPctg>0</ShotPctg>
<ShiftsPerGame>0</ShiftsPerGame>
<FOWinPctg>0</FOWinPctg>
</Player>
<Player Rank="2">
<Name>Steven Stamkos</Name>
<Team>TBL</Team>
<Pos>C</Pos>
<GP>1</GP>
<G>0</G>
<A>0</A>
<Points>0</Points>
<PlusMinus>0</PlusMinus>
<PIM>2</PIM>
<PP>0</PP>
<SH>0</SH>
<GW>0</GW>
<OT>0</OT>
<Shots>4</Shots>
<ShotPctg>-0,6000004</ShotPctg>
<ShiftsPerGame>-0,09999847</ShiftsPerGame>
<FOWinPctg>0,09999847</FOWinPctg>
</Player>
[...]
</StatsDiff>
I have spared to show the implementation for the CalculateDifferential() method, it is rather cryptic but it is fast and efficient. This way I could obtain the results wanted without using any other reference but the strict minimum, without having to use XSL...
Using XSLT I wrote a Microsoft compliant XSLT 1.0 solution using a tree comparison algorithm to find differences in any two xml files. I have posted the sheet to my github library. It outputs any nodes with differences between them, however if it does not find a match it searches sibling nodes. The variable at the top of the sheet is where you set the input sheet to compare against.
It is efficient with only a few limitations.
https://github.com/sflynn1812/xslt-diff