I know there are few xml/xslt merge related questions here however none seems to solve the problem I have.
What I am looking is an XSLT (as generic as possible - not tight with the structure of the input XML files) which can
Merge a.xml with b.xml and generate c.xml such a way that
- c.xml will contain the common nodes between a.xml and b.xml (with the node
values taken from a.xml)
- in addition c.xml will contain the nodes(and values) which are present in b.xml and not in a.xml
For example: merging a.xml:
<root_node>
<settings>
<setting1>a1</setting1>
<setting2>a2</setting2>
<setting3>
<setting31>a3</setting31>
</setting3>
<setting4>a4</setting4>
</settings>
</root_node>
with b.xml:
<root_node>
<settings>
<setting1>b1</setting1>
<setting2>b2</setting2>
<setting3>
<setting31>b3</setting31>
</setting3>
<setting5 id="77">b5</setting5>
</settings>
</root_node>
will generate c.xml:
<root_node>
<settings>
<setting1>a1</setting1>
<setting2>a2</setting2>
<setting3>
<setting31>a3</setting31>
</setting3>
<setting5 id="77">b5</setting5>
</settings>
Additional Information
I will try to explain what I understand by a "common node". This might not be an accurate xml/xslt definition
since I am not an expert in any.
a/root_node/settings/setting1 is a "common node" with b/root_node/settings/setting1 since the 2 nodes are reached using the same path. The same for setting2 and setting3.
The 2 "non-common nodes" are a/root_node/settings/setting4 which is found only in a.xml
(it should not come in the output) and b/root_node/settings/setting5 which is found only in b.xml (it should come into the output).
By "generic solution" I don't mean something that will work whatever format the input XMLs will have. What I mean by that is that the xslt should not contain hard-code xpaths while you might add restrictions like "this will work only if the nodes in a.xml are unique" or whatever other
restriction you might think it will be suitable.
The following XSLT 1.0 program does what you want.
Apply it to b.xml
and pass in the path to a.xml
as a parameter.
Here is how it works.
- It traverses
B
, as that contains the new nodes that you want to keep as well as the common elements between A
and B
.
- I define "common element" as any element that has the same simple path.
- I define "simple path" as the slash-delimited list of names of ancestor elements and the element itself, i.e. the
ancestor-or-self
axis.
So in your sample B
, <setting31>
would have a simple path of root_node/settings/setting3/setting31/
.
- Note that this path is ambiguous. The implication is that you cannot have any two elements with the same name that share the same parent in your input. Based on your samples I presume that will not be the case.
- For every leaf text node (any text node in an element with no further child elements)
- The simple path is calculated with a template called
calculatePath
.
- The recursive template
nodeValueByPath
is called that tries to retrieve the text value of the corresponding simple path from the other document.
- If a corresponding text node is found, its value is used. This satisfies your first bullet point.
- If no corresponding node is found, it uses the value at hand, i.e. the value from
B
. This satisfies your second bullet point.
As a result, the new document matches B
's structure and contains:
- all text node values from
B
that have no corresponding node in A
.
- text node values from
A
when a corresponding node in B
exists.
Here's the XSLT:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes" />
<xsl:param name="aXmlPath" select="''" />
<xsl:param name="aDoc" select="document($aXmlPath)" />
<xsl:template match="@* | node()">
<xsl:copy>
<xsl:apply-templates select="@* | node()" />
</xsl:copy>
</xsl:template>
<!-- text nodes will be checked against doc A -->
<xsl:template match="*[not(*)]/text()">
<xsl:variable name="path">
<xsl:call-template name="calculatePath" />
</xsl:variable>
<xsl:variable name="valueFromA">
<xsl:call-template name="nodeValueByPath">
<xsl:with-param name="path" select="$path" />
<xsl:with-param name="context" select="$aDoc" />
</xsl:call-template>
</xsl:variable>
<xsl:choose>
<!-- either there is something at that path in doc A -->
<xsl:when test="starts-with($valueFromA, 'found:')">
<!-- remove prefix added in nodeValueByPath, see there -->
<xsl:value-of select="substring-after($valueFromA, 'found:')" />
</xsl:when>
<!-- or we take the value from doc B -->
<xsl:otherwise>
<xsl:value-of select="." />
</xsl:otherwise>
</xsl:choose>
</xsl:template>
<!-- this calcluates a simpe path for a node -->
<xsl:template name="calculatePath">
<xsl:for-each select="..">
<xsl:call-template name="calculatePath" />
</xsl:for-each>
<xsl:if test="self::*">
<xsl:value-of select="concat(name(), '/')" />
</xsl:if>
</xsl:template>
<!-- this retrieves a node value by its simple path -->
<xsl:template name="nodeValueByPath">
<xsl:param name="path" select="''" />
<xsl:param name="context" select="''" />
<xsl:if test="contains($path, '/') and count($context)">
<xsl:variable name="elemName" select="substring-before($path, '/')" />
<xsl:variable name="nextPath" select="substring-after($path, '/')" />
<xsl:variable name="currContext" select="$context/*[name() = $elemName][1]" />
<xsl:if test="$currContext">
<xsl:choose>
<xsl:when test="contains($nextPath, '/')">
<xsl:call-template name="nodeValueByPath">
<xsl:with-param name="path" select="$nextPath" />
<xsl:with-param name="context" select="$currContext" />
</xsl:call-template>
</xsl:when>
<xsl:when test="not($currContext/*)">
<!-- always add a prefix so we can detect
the case "exists in A, but is empty" -->
<xsl:value-of select="concat('found:', $currContext/text())" />
</xsl:when>
</xsl:choose>
</xsl:if>
</xsl:if>
</xsl:template>
</xsl:stylesheet>
the basic technique to operate on multiple files is through the document() function. The document function looks like this:
<xsl:variable name="var1" select="document('http://example.com/file1.xml', /)"/>
<xsl:variable name="var2" select="document('http://example.com/file2.xml', /)"/>
Once you have the two documents, you can use their contents like they are available in the same document.