XPath: Match whole word (using matches function wi

2019-03-30 09:53发布

问题:

Using XPath, I would like to "Match whole word" (option for user, just like in VS search).

It seems as though the functions contains and matches work similarly though matches allows for flags like i for case insensitivity.

In other words, I am getting the same results with these two XPath queries:

<pets>
    <dog name="Rupert" color="grey"/>
    <dog name="Ralph" color="brown"/>
    <cat name="Marvin the Cat" color="white"/>
    <cat name="Garfield the Cat" color="orange"/>
    <cat name="Cat" color="grey"/>
    <cat name="Fluffy" color="black"/>
</pets>

Matches XPath: //cat[descendant-or-self::*[@*[matches(.,'Cat')]]]
    returns:
    <cat name="Marvin the Cat" color="white"/>
    <cat name="Garfield the Cat" color="orange"/>
    <cat name="Cat" color="grey"/>


Contains XPath: //cat[descendant-or-self::*[@*[contains(.,'Cat')]]]
    returns:
    <cat name="Marvin the Cat" color="white"/>
    <cat name="Garfield the Cat" color="orange"/>
    <cat name="Cat" color="grey"/>

But I would like to use matches to return results that match "Cat" whole word only:

<cat name="Cat" color="grey"/>

How can I adjust the matches query so it matches whole word?

EDIT: I forgot to mention that I need to still use the matches function because I need the case insensitivity flag.

回答1:

What about using ^ and $ characters as anchors?

//cat[descendant-or-self::*[@*[matches(.,'^Cat$')]]]

From RegEx Syntax in XQuery 1.0 and XPath 2.0:

Two meta-characters, ^ and $ are added. By default, the meta-character ^ matches the start of the entire string, while $ matches the end of the entire string.



回答2:

Would this work for you?

//cat[@*='Cat']


回答3:

There are three functions/operators of relevance here.

matches() does a regular expression match; you can use it to match a substring or to match the entire string by use of anchors (^cat$), and you can set the 'i' flag to make it case-blind.

contains() does an exact match of a substring; you can use the third argument (collation) to request a case-blind match, but the way in which collations are specified depends on the processor you are using.

The eq operator does an exact match of the entire string; the "default collation" (which in the case of XPath will typically be set using the processor's API) can be used to request case-blind matching. This seems to be the one that is closest to your requirement, the only drawback is that specifying the collation is more system-dependent than using the "i" flag with matches().



回答4:

But I would like to use matches to return results that match "Cat" whole word only:

<cat name="Cat" color="grey"/>

There are different XPath expression that select the wanted element:

Use:

/*/cat[matches(@name, '^cat$', 'i')]

Or use:

/*/cat[lower-case(@name) eq 'cat']

XSLT - based verification:

<xsl:stylesheet version="2.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:xs="http://www.w3.org/2001/XMLSchema">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="/">
  <xsl:copy-of select=
   "/*/cat[matches(@name, '^cat$', 'i')]"/>
======
  <xsl:copy-of select=
   "/*/cat[lower-case(@name) eq 'cat']"/>

 </xsl:template>
</xsl:stylesheet>

when applied on the provided XML document:

<pets>
    <dog name="Rupert" color="grey"/>
    <dog name="Ralph" color="brown"/>
    <cat name="Marvin the Cat" color="white"/>
    <cat name="Garfield the Cat" color="orange"/>
    <cat name="Cat" color="grey"/>
    <cat name="Fluffy" color="black"/>
</pets>

this transformation evaluates the two XPath expressions and copies the selected elements to the output:

  <cat name="Cat" color="grey"/>
======
  <cat name="Cat" color="grey"/>


回答5:

This:

//cat[@*='Cat']

results in:

<cat name="Cat" color="grey"/>

I verified using Xacobeo.