XSLT filtering elements

2019-08-05 17:42发布

For example, let's assume that input xml has following structure:

<root>
  <a>
    <aa>1</aa>
    <ab>2</ab>
    <ac>3</ac>
  </a>
  <b>
    <ba>4</ba>
    <bb>5</bb>
  <b>
  <c>
    <ca>
      <caa>6</caa>
      <cab>7</cab>
    </ca>
  </c>
</root>

Given set of xpath to filter elements by:

/root/a/ab,
/root/a/ac,
/root/c/ca/cab

The resulting xml should be:

<root>
  <a>
    <ab>2</ab>
    <ac>3</ac>
  </a>
  <c>
    <ca>
      <cab>7</cab>
    </ca>
  </c>
</root>

How this could be expressed by XSLT?

Thank you in advance

标签: xml xslt
3条回答
该账号已被封号
2楼-- · 2019-08-05 17:51

Here is an example using Saxon 9.5 PE or EE and XSLT 3.0 (working draft version currently implemented in those Saxon versions):

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="3.0"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  exclude-result-prefixes="xs">

<xsl:param name="paths" as="xs:string">
/root/a/ab,
/root/a/ac,
/root/c/ca/cab
</xsl:param>

<xsl:variable name="nodes" as="node()*">
  <xsl:evaluate xpath="$paths" context-item="/"/>
</xsl:variable>

<xsl:output indent="yes"/>

<xsl:template match="*[(.//node(), .//@*) intersect $nodes]">
  <xsl:copy>
    <xsl:apply-templates select="@* | node()[(., .//node(), .//@*) intersect $nodes]"/>
  </xsl:copy>
</xsl:template>

<xsl:template match="node()[. intersect $nodes]">
  <xsl:copy-of select="."/>
</xsl:template>

</xsl:stylesheet>

Here is a different version that makes use of the new XSLT 3.0 feature to have a variable reference as a match pattern, I assume that that way the code is much more efficient (and readable):

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="3.0"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  exclude-result-prefixes="xs">

<xsl:param name="paths" as="xs:string">
/root/a/ab,
/root/a/ac,
/root/c/ca/cab
</xsl:param>

<xsl:variable name="nodes" as="node()*">
  <xsl:evaluate xpath="$paths" context-item="/"/>
</xsl:variable>

<xsl:variable name="ancestors" as="node()*" select="$nodes/ancestor::node()"/>

<xsl:output indent="yes"/>

<xsl:template match="$ancestors">
  <xsl:copy>
    <xsl:apply-templates select="@* , node()[. intersect $ancestors or . intersect $nodes]"/>
  </xsl:copy>
</xsl:template>

<xsl:template match="$nodes">
  <xsl:copy-of select="."/>
</xsl:template>

</xsl:stylesheet>
查看更多
甜甜的少女心
3楼-- · 2019-08-05 17:57

This is a more complex XSLT 1.0 answer (also requiring the EXSLT node-set() function), that solves the issues of duplicate branches by performing three passes of transformation:

In the first pass, the ids of the given elements are collected, using an identity transform template with a "pass-thru" parameter to identify them - similar to my previous answer;

In the second pass, each given element "collects" the ids of itself and of its ancestors;

In the third and final pass, an identity transform template is used again to go over the entire source tree and output only elements whose ids have been collected in step 2.

Note that the given paths do not need to be pre-processed in this version.

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
xmlns:exsl="http://exslt.org/common"
extension-element-prefixes="exsl">
<xsl:output method="xml" version="1.0" encoding="utf-8" indent="yes"/>
<xsl:strip-space elements="*"/>

<xsl:param name="paths">
    <path>/root/a/ab</path>
    <path>/root/a/ac</path>
    <path>/root/c/ca/cab</path>
</xsl:param>

<!-- first pass: get ids of given nodes -->
<xsl:variable name="ids">
    <xsl:apply-templates select="/" mode="getids"/>
</xsl:variable>

<xsl:template match="*" mode="getids">
<xsl:param name="pathtrain" />
<xsl:variable name="path" select="concat($pathtrain, '/', name())" />
<xsl:if test="$path=exsl:node-set($paths)/path">
    <id><xsl:value-of select="generate-id()" /></id>
    </xsl:if>
    <xsl:apply-templates select="*" mode="getids">
        <xsl:with-param name="pathtrain" select="$path"/>
    </xsl:apply-templates>
</xsl:template>

<!-- second pass: extend the list of ids to given nodes and their ancestors-->
<xsl:variable name="extids">
    <xsl:for-each select="//*[generate-id(.)=exsl:node-set($ids)/id]">
        <xsl:for-each select="ancestor-or-self::*">
            <id><xsl:value-of select="generate-id()" /></id>
        </xsl:for-each>
    </xsl:for-each>
</xsl:variable>

<!-- third pass: output the nodes whose ids are in the extended list -->
<xsl:template match="@* | node()">
    <xsl:if test="generate-id(.)=exsl:node-set($extids)/id or not(self::*)">
        <xsl:copy>
            <xsl:apply-templates select="@* | node()"/>
        </xsl:copy>
    </xsl:if>
</xsl:template>

</xsl:stylesheet>

The above stylesheet, when applied to the following "duplicate branches" input:

<root>
  <a>
    <aa>1</aa>
    <ab>2</ab>
    <ac>3</ac>
  </a>
  <b>
    <ba>4</ba>
    <bb>5</bb>
  </b>
  <c>
    <ca>
      <caa>6</caa>
    </ca>
  </c>
  <c>
    <ca>
      <cab>7</cab>
    </ca>
  </c>
</root>

produces the following result:

<?xml version="1.0" encoding="utf-8"?>
<root>
  <a>
    <ab>2</ab>
    <ac>3</ac>
  </a>
  <c>
    <ca>
      <cab>7</cab>
    </ca>
  </c>
</root>
查看更多
Root(大扎)
4楼-- · 2019-08-05 18:10

To accomplish this in XSLT 1.0 (with possibly some small assistance by EXSLT) or 2.0, you could start by breaking each given path into itself and ancestor paths, so that:

/root/c/ca/cab

for example, becomes:

<path>/root/c/ca/cab</path>
<path>/root/c/ca</path>
<path>/root/c</path>
<path>/root</path>

This shouldn't be too difficult to accomplish by a named recursive template.

Once you have that in place, you can use the identity transform modified by adding a "pass-thru" parameter so that each processed element can calculate the path to itself, compare it to the given list of paths and determine whether it should join the result tree or not.

In the following stylesheet, step 1 has been skipped and the result is being used as if given.

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
xmlns:exsl="http://exslt.org/common"
extension-element-prefixes="exsl">
<xsl:output method="xml" version="1.0" encoding="utf-8" indent="yes"/>
<xsl:strip-space elements="*"/>

<xsl:param name="paths">
    <path>/root/a/ab</path>
    <path>/root/a</path>
    <path>/root</path>

    <path>/root/a/ac</path>
    <path>/root/a</path>
    <path>/root</path>

    <path>/root/c/ca/cab</path>
    <path>/root/c/ca</path>
    <path>/root/c</path>
    <path>/root</path>
</xsl:param>

<xsl:template match="@* | node()">
<xsl:param name="pathtrain" />
<xsl:variable name="path" select="concat($pathtrain, '/', name())" />
<xsl:if test="$path=exsl:node-set($paths)/path or not(self::*)">
    <xsl:copy>
         <xsl:apply-templates select="@* | node()">
            <xsl:with-param name="pathtrain" select="$path"/>
        </xsl:apply-templates>
    </xsl:copy>
</xsl:if>
</xsl:template>

</xsl:stylesheet>

Applied to your (corrected) input of:

<root>
  <a>
    <aa>1</aa>
    <ab>2</ab>
    <ac>3</ac>
  </a>
  <b>
    <ba>4</ba>
    <bb>5</bb>
  </b>
  <c>
    <ca>
      <caa>6</caa>
      <cab>7</cab>
    </ca>
  </c>
</root>

the following result is obtained:

<?xml version="1.0" encoding="utf-8"?>
<root>
  <a>
    <ab>2</ab>
    <ac>3</ac>
  </a>
  <c>
    <ca>
      <cab>7</cab>
    </ca>
  </c>
</root>

EDIT:

Note that duplicate branches may produce false positives when using a string-based test as above. For example, when applied to the following input:

<root>
  <a>
    <aa>1</aa>
    <ab>2</ab>
    <ac>3</ac>
  </a>
  <b>
    <ba>4</ba>
    <bb>5</bb>
  </b>
  <c>
    <ca>
      <caa>6</caa>
    </ca>
  </c>
  <c>
    <ca>
      <cab>7</cab>
    </ca>
  </c>
</root>

the above stylesheet will produce:

<?xml version="1.0" encoding="utf-8"?>
<root>
  <a>
    <ab>2</ab>
    <ac>3</ac>
  </a>
  <c>
    <ca/>
  </c>
  <c>
    <ca>
      <cab>7</cab>
    </ca>
  </c>
</root>

If this is a problem, I will post another (more complex) XSLT 1.0 answer that eliminates the issue by testing unique ids instead.

查看更多
登录 后发表回答