EDIT: I also have access to ESXLT functions.
I have two node sets of string tokens. One set contains values like these:
/Geography/North America/California/San Francisco
/Geography/Asia/Japan/Tokyo/Shinjuku
The other set contains values like these:
/Geography/North America/
/Geography/Asia/Japan/
My goal is to find a "match" between the two. A match is made when any string in set 1 begins with a string in set 2. For example, a match would be made between /Geography/North America/California/San Francisco and /Geography/North America/ because a string from set 1 begins with a string from set 2.
I can compare strings using wildcards by using a third-party extension. I can also use a regular expression all within an Xpath.
My problem is how do I structure the Xpath to select using a function between all nodes of both sets? XSL is also a viable option.
This XPATH:
count($set1[.=$set2])
Would yield the count of intersection between set1 and set2, but it's a 1-to-1 comparison. Is it possible to use some other means of comparing the nodes?
EDIT: I did get this working, but I am cheating by using some of the other third-party extensions to get the same result. I am still interested in other methods to get this done.
There is a simple and pure XSLT 1.0 solution (no extensions needed) for finding the count of matches:
When this transformation is applied on the following XML document:
the correct result is produced:
2
Do note that one character (an asterisk) is produced for every match found and all these asterisks form the content of the
$vStars
variable. We then simply output itsstring-length()
.I guess I couldn't make the XPath above work. I started with the following XML doc to initialize the two nodesets:
I think this stylesheet ought to implement Robert's solution, but I only get a count of '1':
I did write a stylesheet that uses a recursive template and does produce the correct count of '2' with the given input doc, but it's far less elegant than Robert's answer. If only I could get the XPath to work--always wanting to learn.
Robert's last
xsl:variable
is good for getting a result tree fragment containing the matching text values, but unless (as he suggests) you use EXSLT or MS extensions to XSLT 1.0 to convert the RTF to a node set, you can't get a count of the matching text nodes.Here is the XSLT stylesheet I mentioned in my prior response that recurs over the sample input document I gave to give a count of text nodes in set 1 for which a node in set 2 matches part or all of it:
Not particularly concise, but because XSLT does not let programmers assign new values to already-defined variables, recursion is often necessary. I don't see a way in XSLT 1.0 to get a count of the sort requested by Zack using
xsl:for-each
.This:
will set
$matches
to a node-set containing every node in$set1
whose text value starts with the text value of a node in $set2. That's what you're looking for, right?Edit:
Well, I'm just wrong about this. Here's why.
starts-with
expects its two arguments to both be strings. If they're not, it will convert them to strings before evaluating the function.If you give it a node-set as one of its arguments, it uses the string value of the node-set, which is the text value of the first node in the set. So in the above,
$set2
never gets searched; only the first node in the list ever gets examined, and so the predicate will only find nodes in$set1
that start with the value of the first node in$set2
.I was misled because this pattern (which I've been using a lot in the last few days) does work:
But that predicate is using an comparison between node-sets, not between text values.
The ideal way to do this would be by nesting predicates. That is, "I want to find every node in
$set1
for which there's a node in$set2
whose value starts with..." and here's where XPath breaks down. Starts with what? What you'd like to write is something like:only there's no expression you can write for the
?
that will return the node currently being tested by the outer predicate. (Unless I'm missing something blindingly obvious.)To get what you want, you have to test each node individually:
That's not a very satisfying solution because it evaluates to a result tree fragment, not a node-set. You'll have to use an extension function (like
msxsl:node-set
) to convert the RTF to a node-set if you want to use the variable in an XPath expression.