Xpath: filter out childs

2020-07-13 10:33发布

问题:

I'm looking for a xpath expression that filters out certain childs. A child must contain a CCC node with B in it.

Source:

<AAA>
    <BBB1>
        <CCC>A</CCC>
    </BBB1>       
    <BBB2>
        <CCC>A</CCC>
    </BBB2>
    <BBB3>
        <CCC>B</CCC>
    </BBB3>
    <BBB4>
        <CCC>B</CCC>
    </BBB4>
</AAA>

This should be the result:


<AAA>
    <BBB3>
        <CCC>B</CCC>
    </BBB3>
    <BBB4>
        <CCC>B</CCC>
    </BBB4>
</AAA>

Hopefully someone can help me.

Jos

回答1:

XPath is a query language for XML documents. As such it can only select nodes from existing XML document(s) -- it cannot modify an XML document or create a new XML document.

Use XSLT in order to transform an XML document and create a new XML document from it.

In this particular case:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match="/*/*[not(CCC = 'B')]"/>
</xsl:stylesheet>

when this transformation is applied on the provided XML document:

<AAA>
    <BBB1>
        <CCC>A</CCC>
    </BBB1>
    <BBB2>
        <CCC>A</CCC>
    </BBB2>
    <BBB3>
        <CCC>B</CCC>
    </BBB3>
    <BBB4>
        <CCC>B</CCC>
    </BBB4>
</AAA>

the wanted, correct result is produced:

<AAA>
   <BBB3>
      <CCC>B</CCC>
   </BBB3>
   <BBB4>
      <CCC>B</CCC>
   </BBB4>
</AAA>


回答2:

In order to select all of the desired element and text nodes, use this XPATH:

//node()[.//CCC[.='B']
      or self::CCC[.='B']
      or self::text()[parent::CCC[.='B']]]

This could be achieved with a more simply/easily using XPATH with a modified identity transform XSLT:

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output indent="yes" />

    <!--Empty template for the content we want to redact -->
    <xsl:template match="*[CCC[not(.='B')]]" />

    <!--By default, copy all content forward -->
    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates/>
        </xsl:copy>
    </xsl:template>

</xsl:stylesheet>


回答3:

try this ,

         "//CCC[text() = 'B']"

It shall give all CCC nodes where the innertext is B.



回答4:

If you want to get AAA, BBB3 and BBB4 you can use the following

//*[descendant::CCC[text()='B']]

If BBB3 and BBB4 only then

//*[CCC[text()='B']]