Annotating an xml instance from a list of xpath st

2019-02-26 02:52发布

问题:

Given a list of xpath statements, I want to write a stylesheet that will run through an xml document and output the same document but with a comment inserted before the node identified in each xpath statement. Let's make up an example. Start with an xml instance holding the xpath statements:

<paths>
  <xpath location="/root/a" annotate="1"/>
  <xpath location="/root/a/b" annotate="2"/>
</paths>

Given the input:

<root>
  <a>
    <b>B</b>
  </a>
  <c>C</c>
</root>

It should produce:

<root>
  <!-- 1 -->
  <a>
    <!-- 2 -->
    <b>B</b>
  </a>
  <c>C</c>
</root>

My initial thought is to have an identity stylesheet which takes a file-list param, calls the document function on it to get the list of xpath nodes. It would then check each node of the input against that list and then insert the comment node when it finds one, but I expect that might be highly inefficient as the list of xpaths gets large (or maybe not, tell me. I'm using saxon 9).

So my question: Is there an efficient way to do something like this?

回答1:

Overview:

Write a meta XSLT transformation that takes the paths file as input and produces a new XSLT transformation as output. This new XSLT will transform from your root input XML to the annotated copy output XML.

Notes:

  1. Works with XSLT 1.0, 2.0, or 3.0.
  2. Should be very efficient, especially if the generated transformation has to be run over a large input or has to be run repeatedly, because it effectively compiles into native XSLT rather than reimplementing matching with an XSLT-based interpreter.
  3. Is more robust than approaches that have to rebuild element ancestry manually in code. Since it maps the paths to template/@match attributes, the full sophistication of @matching is available efficiently. I've included an attribute value test as an example.
  4. Be sure to consider elegant XSLT 2.0 and 3.0 solutions by @DanielHaley and @MartinHonnen, especially if an intermediate meta XSLT file won't work for you. By leveraging XSLT 3.0's XPath evaluation facilities, @MartinHonnen's answer appears to be able to provide even more robust matching than template/@match does here.

This input XML that specifies XPaths and annotations:

<paths>
  <xpath location="/root/a" annotate="1"/>
  <xpath location="/root/a/b" annotate="2"/>
  <xpath location="/root/c[@x='123']" annotate="3"/>
</paths>

When input to this meta XSLT transformation:

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" indent="yes"/>

  <xsl:template match="/paths">
    <xsl:element name="xsl:stylesheet">
      <xsl:attribute name="version">1.0</xsl:attribute>
      <xsl:element name="xsl:output">
        <xsl:attribute name="method">xml</xsl:attribute>
        <xsl:attribute name="indent">yes</xsl:attribute>
      </xsl:element>
      <xsl:call-template name="gen_identity_template"/>
      <xsl:apply-templates select="xpath"/>
    </xsl:element>
  </xsl:template>

  <xsl:template name="gen_identity_template">
    <xsl:element name="xsl:template">
      <xsl:attribute name="match">node()|@*</xsl:attribute>
      <xsl:element name="xsl:copy">
        <xsl:element name="xsl:apply-templates">
          <xsl:attribute name="select">node()|@*</xsl:attribute>
        </xsl:element>
      </xsl:element>
    </xsl:element>
  </xsl:template>

  <xsl:template match="xpath">
    <xsl:element name="xsl:template">
      <xsl:attribute name="match">
        <xsl:value-of select="@location"/>
      </xsl:attribute>
      <xsl:element name="xsl:comment">
        <xsl:value-of select="@annotate"/>
      </xsl:element>
      <xsl:element name="xsl:text">
        <xsl:text disable-output-escaping="yes">&amp;#xa;</xsl:text>
      </xsl:element>
      <xsl:element name="xsl:copy">
        <xsl:element name="xsl:apply-templates">
          <xsl:attribute name="select">node()|@*</xsl:attribute>
        </xsl:element>
      </xsl:element>
    </xsl:element>
  </xsl:template>
</xsl:stylesheet>

Will produce this XSLT transformation:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
   <xsl:output method="xml" indent="yes"/>
   <xsl:template match="node()|@*">
      <xsl:copy>
         <xsl:apply-templates select="node()|@*"/>
      </xsl:copy>
   </xsl:template>
   <xsl:template match="/root/a">
      <xsl:comment>1</xsl:comment>
      <xsl:text>&#xa;</xsl:text>
      <xsl:copy>
         <xsl:apply-templates select="node()|@*"/>
      </xsl:copy>
   </xsl:template>
   <xsl:template match="/root/a/b">
      <xsl:comment>2</xsl:comment>
      <xsl:text>&#xa;</xsl:text>
      <xsl:copy>
         <xsl:apply-templates select="node()|@*"/>
      </xsl:copy>
   </xsl:template>
   <xsl:template match="/root/c[@x='123']">
      <xsl:comment>3</xsl:comment>
      <xsl:text>&#xa;</xsl:text>
      <xsl:copy>
         <xsl:apply-templates select="node()|@*"/>
      </xsl:copy>
   </xsl:template>
</xsl:stylesheet>

Which, when provided this input XML file:

<root>
  <a>
    <b>B</b>
  </a>
  <c x="123">C</c>
</root>

Will produce the desired output XML file:

<?xml version="1.0" encoding="UTF-8"?>
<root>
  <!--1-->
   <a>
    <!--2-->
      <b>B</b>
  </a>
  <!--3-->
   <c x="123">C</c>
</root>


回答2:

Assuming Saxon 9 PE or EE, it should also be possible to make use XSLT 3.0 and of xsl:evaluate as follows:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:math="http://www.w3.org/2005/xpath-functions/math"
    xmlns:map="http://www.w3.org/2005/xpath-functions/map"
    xmlns:mf="http://example.com/mf"
    exclude-result-prefixes="xs math map mf"
    version="3.0">

    <xsl:output indent="yes"/>

    <xsl:param name="paths-url" as="xs:string" select="'paths1.xml'"/>
    <xsl:param name="paths-doc" as="document-node()" select="doc($paths-url)"/>

    <xsl:variable name="main-root" select="/"/>

    <xsl:variable 
        name="mapped-nodes">
        <map>
            <xsl:for-each select="$paths-doc/paths/xpath">
                <xsl:variable name="node" as="node()?" select="mf:evaluate(@location, $main-root)"/>
                <xsl:if test="$node">
                    <entry key="{generate-id($node)}">
                        <xsl:value-of select="@annotate"/>
                    </entry>
                </xsl:if>
            </xsl:for-each>
        </map>
    </xsl:variable>

    <xsl:key name="node-by-id" match="map/entry" use="@key"/>

    <xsl:function name="mf:evaluate" as="node()?">
        <xsl:param name="path" as="xs:string"/>
        <xsl:param name="context" as="node()"/>
        <xsl:evaluate xpath="$path" context-item="$context"></xsl:evaluate>
    </xsl:function>

    <xsl:template match="@* | node()">
        <xsl:copy>
            <xsl:apply-templates select="@* , node()"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="node()[key('node-by-id', generate-id(), $mapped-nodes)]">
        <xsl:comment select="key('node-by-id', generate-id(), $mapped-nodes)"/>
        <xsl:text>&#10;</xsl:text>
        <xsl:copy>
            <xsl:apply-templates select="@* , node()"/>
        </xsl:copy>
    </xsl:template>


</xsl:stylesheet>

Here is an edited version of the originally posted code that uses the XSLT 3.0 map feature instead of a temporary document to store the association between the generated id of a node found by dynamic XPath evaluation and the annotation:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:math="http://www.w3.org/2005/xpath-functions/math"
    xmlns:map="http://www.w3.org/2005/xpath-functions/map"
    xmlns:mf="http://example.com/mf"
    exclude-result-prefixes="xs math map mf"
    version="3.0">

    <xsl:param name="paths-url" as="xs:string" select="'paths1.xml'"/>
    <xsl:param name="paths-doc" as="document-node()" select="doc($paths-url)"/>

    <xsl:output indent="yes"/>

    <xsl:variable 
        name="mapped-nodes"
        as="map(xs:string, xs:string)"
        select="map:new(for $path in $paths-doc/paths/xpath, $node in mf:evaluate($path/@location, /) return map:entry(generate-id($node), string($path/@annotate)))"/>

    <xsl:function name="mf:evaluate" as="node()?">
        <xsl:param name="path" as="xs:string"/>
        <xsl:param name="context" as="node()"/>
        <xsl:evaluate xpath="$path" context-item="$context"></xsl:evaluate>
    </xsl:function>

    <xsl:template match="@* | node()">
        <xsl:copy>
            <xsl:apply-templates select="@* , node()"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="node()[map:contains($mapped-nodes, generate-id())]">
        <xsl:comment select="$mapped-nodes(generate-id())"/>
        <xsl:text>&#10;</xsl:text>
        <xsl:copy>
            <xsl:apply-templates select="@* , node()"/>
        </xsl:copy>
    </xsl:template>


</xsl:stylesheet>

As the first stylesheet, it needs Saxon 9.5 PE or EE to be run.



回答3:

I'm not sure if kjhughes' suggestion of creating a second transform would be more efficient than your original idea or not. I do see the possibility of that second transform becoming huge if your paths XML gets large.

Here's how I'd do it...

XML Input

<root>
    <a>
        <b>B</b>
    </a>
    <c>C</c>
</root>

"paths" XML (paths.xml)

<paths>
    <xpath location="/root/a" annotate="1"/>
    <xpath location="/root/a/b" annotate="2"/>
</paths>

XSLT 2.0

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output indent="yes"/>
    <xsl:strip-space elements="*"/>

    <xsl:param name="paths" select="document('paths.xml')"/>

    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="*" priority="1">
        <xsl:variable name="path">
            <xsl:for-each select="ancestor-or-self::*">
                <xsl:value-of select="concat('/',local-name())"/>
            </xsl:for-each>
        </xsl:variable>
        <xsl:if test="$paths/*/xpath[@location=$path]">
            <xsl:comment select="$paths/*/xpath[@location=$path]/@annotate"/>
        </xsl:if>
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

</xsl:stylesheet>

XML Output

<root>
    <!--1-->
    <a>
        <!--2-->
        <b>B</b>
    </a>
    <c>C</c>
</root>