Given a list of xpath statements, I want to write a stylesheet that will run through an xml document and output the same document but with a comment inserted before the node identified in each xpath statement. Let's make up an example. Start with an xml instance holding the xpath statements:
<paths>
<xpath location="/root/a" annotate="1"/>
<xpath location="/root/a/b" annotate="2"/>
</paths>
Given the input:
<root>
<a>
<b>B</b>
</a>
<c>C</c>
</root>
It should produce:
<root>
<!-- 1 -->
<a>
<!-- 2 -->
<b>B</b>
</a>
<c>C</c>
</root>
My initial thought is to have an identity stylesheet which takes a file-list
param, calls the document
function on it to get the list of xpath nodes. It would then check each node of the input against that list and then insert the comment node when it finds one, but I expect that might be highly inefficient as the list of xpaths gets large (or maybe not, tell me. I'm using saxon 9).
So my question: Is there an efficient way to do something like this?
Overview:
Write a meta XSLT transformation that takes the paths
file as input and produces a new XSLT transformation as output. This new XSLT will transform from your root
input XML to the annotated copy output XML.
Notes:
- Works with XSLT 1.0, 2.0, or 3.0.
- Should be very efficient, especially if the generated
transformation has to be run over a large input or has to be run
repeatedly, because it effectively compiles into native XSLT rather
than reimplementing matching with an XSLT-based interpreter.
- Is more robust than approaches that have to rebuild
element ancestry manually in code. Since it maps the paths to
template/@match
attributes, the full sophistication of @match
ing
is available efficiently. I've included an attribute value test as
an example.
- Be sure to consider elegant XSLT 2.0 and 3.0 solutions by @DanielHaley
and @MartinHonnen, especially if an intermediate meta XSLT file
won't work for you. By leveraging XSLT 3.0's XPath evaluation
facilities, @MartinHonnen's answer appears to be able to provide
even more robust matching than
template/@match
does here.
This input XML that specifies XPaths and annotations:
<paths>
<xpath location="/root/a" annotate="1"/>
<xpath location="/root/a/b" annotate="2"/>
<xpath location="/root/c[@x='123']" annotate="3"/>
</paths>
When input to this meta XSLT transformation:
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="/paths">
<xsl:element name="xsl:stylesheet">
<xsl:attribute name="version">1.0</xsl:attribute>
<xsl:element name="xsl:output">
<xsl:attribute name="method">xml</xsl:attribute>
<xsl:attribute name="indent">yes</xsl:attribute>
</xsl:element>
<xsl:call-template name="gen_identity_template"/>
<xsl:apply-templates select="xpath"/>
</xsl:element>
</xsl:template>
<xsl:template name="gen_identity_template">
<xsl:element name="xsl:template">
<xsl:attribute name="match">node()|@*</xsl:attribute>
<xsl:element name="xsl:copy">
<xsl:element name="xsl:apply-templates">
<xsl:attribute name="select">node()|@*</xsl:attribute>
</xsl:element>
</xsl:element>
</xsl:element>
</xsl:template>
<xsl:template match="xpath">
<xsl:element name="xsl:template">
<xsl:attribute name="match">
<xsl:value-of select="@location"/>
</xsl:attribute>
<xsl:element name="xsl:comment">
<xsl:value-of select="@annotate"/>
</xsl:element>
<xsl:element name="xsl:text">
<xsl:text disable-output-escaping="yes">&#xa;</xsl:text>
</xsl:element>
<xsl:element name="xsl:copy">
<xsl:element name="xsl:apply-templates">
<xsl:attribute name="select">node()|@*</xsl:attribute>
</xsl:element>
</xsl:element>
</xsl:element>
</xsl:template>
</xsl:stylesheet>
Will produce this XSLT transformation:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="/root/a">
<xsl:comment>1</xsl:comment>
<xsl:text>
</xsl:text>
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="/root/a/b">
<xsl:comment>2</xsl:comment>
<xsl:text>
</xsl:text>
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="/root/c[@x='123']">
<xsl:comment>3</xsl:comment>
<xsl:text>
</xsl:text>
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Which, when provided this input XML file:
<root>
<a>
<b>B</b>
</a>
<c x="123">C</c>
</root>
Will produce the desired output XML file:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<!--1-->
<a>
<!--2-->
<b>B</b>
</a>
<!--3-->
<c x="123">C</c>
</root>
Assuming Saxon 9 PE or EE, it should also be possible to make use XSLT 3.0 and of xsl:evaluate
as follows:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:math="http://www.w3.org/2005/xpath-functions/math"
xmlns:map="http://www.w3.org/2005/xpath-functions/map"
xmlns:mf="http://example.com/mf"
exclude-result-prefixes="xs math map mf"
version="3.0">
<xsl:output indent="yes"/>
<xsl:param name="paths-url" as="xs:string" select="'paths1.xml'"/>
<xsl:param name="paths-doc" as="document-node()" select="doc($paths-url)"/>
<xsl:variable name="main-root" select="/"/>
<xsl:variable
name="mapped-nodes">
<map>
<xsl:for-each select="$paths-doc/paths/xpath">
<xsl:variable name="node" as="node()?" select="mf:evaluate(@location, $main-root)"/>
<xsl:if test="$node">
<entry key="{generate-id($node)}">
<xsl:value-of select="@annotate"/>
</entry>
</xsl:if>
</xsl:for-each>
</map>
</xsl:variable>
<xsl:key name="node-by-id" match="map/entry" use="@key"/>
<xsl:function name="mf:evaluate" as="node()?">
<xsl:param name="path" as="xs:string"/>
<xsl:param name="context" as="node()"/>
<xsl:evaluate xpath="$path" context-item="$context"></xsl:evaluate>
</xsl:function>
<xsl:template match="@* | node()">
<xsl:copy>
<xsl:apply-templates select="@* , node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="node()[key('node-by-id', generate-id(), $mapped-nodes)]">
<xsl:comment select="key('node-by-id', generate-id(), $mapped-nodes)"/>
<xsl:text> </xsl:text>
<xsl:copy>
<xsl:apply-templates select="@* , node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Here is an edited version of the originally posted code that uses the XSLT 3.0 map feature instead of a temporary document to store the association between the generated id of a node found by dynamic XPath evaluation and the annotation:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:math="http://www.w3.org/2005/xpath-functions/math"
xmlns:map="http://www.w3.org/2005/xpath-functions/map"
xmlns:mf="http://example.com/mf"
exclude-result-prefixes="xs math map mf"
version="3.0">
<xsl:param name="paths-url" as="xs:string" select="'paths1.xml'"/>
<xsl:param name="paths-doc" as="document-node()" select="doc($paths-url)"/>
<xsl:output indent="yes"/>
<xsl:variable
name="mapped-nodes"
as="map(xs:string, xs:string)"
select="map:new(for $path in $paths-doc/paths/xpath, $node in mf:evaluate($path/@location, /) return map:entry(generate-id($node), string($path/@annotate)))"/>
<xsl:function name="mf:evaluate" as="node()?">
<xsl:param name="path" as="xs:string"/>
<xsl:param name="context" as="node()"/>
<xsl:evaluate xpath="$path" context-item="$context"></xsl:evaluate>
</xsl:function>
<xsl:template match="@* | node()">
<xsl:copy>
<xsl:apply-templates select="@* , node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="node()[map:contains($mapped-nodes, generate-id())]">
<xsl:comment select="$mapped-nodes(generate-id())"/>
<xsl:text> </xsl:text>
<xsl:copy>
<xsl:apply-templates select="@* , node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
As the first stylesheet, it needs Saxon 9.5 PE or EE to be run.
I'm not sure if kjhughes' suggestion of creating a second transform would be more efficient than your original idea or not. I do see the possibility of that second transform becoming huge if your paths
XML gets large.
Here's how I'd do it...
XML Input
<root>
<a>
<b>B</b>
</a>
<c>C</c>
</root>
"paths" XML (paths.xml)
<paths>
<xpath location="/root/a" annotate="1"/>
<xpath location="/root/a/b" annotate="2"/>
</paths>
XSLT 2.0
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:param name="paths" select="document('paths.xml')"/>
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="*" priority="1">
<xsl:variable name="path">
<xsl:for-each select="ancestor-or-self::*">
<xsl:value-of select="concat('/',local-name())"/>
</xsl:for-each>
</xsl:variable>
<xsl:if test="$paths/*/xpath[@location=$path]">
<xsl:comment select="$paths/*/xpath[@location=$path]/@annotate"/>
</xsl:if>
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
XML Output
<root>
<!--1-->
<a>
<!--2-->
<b>B</b>
</a>
<c>C</c>
</root>