For example, let's assume that input xml has following structure:
<root>
<a>
<aa>1</aa>
<ab>2</ab>
<ac>3</ac>
</a>
<b>
<ba>4</ba>
<bb>5</bb>
<b>
<c>
<ca>
<caa>6</caa>
<cab>7</cab>
</ca>
</c>
</root>
Given set of xpath to filter elements by:
/root/a/ab,
/root/a/ac,
/root/c/ca/cab
The resulting xml should be:
<root>
<a>
<ab>2</ab>
<ac>3</ac>
</a>
<c>
<ca>
<cab>7</cab>
</ca>
</c>
</root>
How this could be expressed by XSLT?
Thank you in advance
Here is an example using Saxon 9.5 PE or EE and XSLT 3.0 (working draft version currently implemented in those Saxon versions):
Here is a different version that makes use of the new XSLT 3.0 feature to have a variable reference as a match pattern, I assume that that way the code is much more efficient (and readable):
This is a more complex XSLT 1.0 answer (also requiring the EXSLT node-set() function), that solves the issues of duplicate branches by performing three passes of transformation:
In the first pass, the ids of the given elements are collected, using an identity transform template with a "pass-thru" parameter to identify them - similar to my previous answer;
In the second pass, each given element "collects" the ids of itself and of its ancestors;
In the third and final pass, an identity transform template is used again to go over the entire source tree and output only elements whose ids have been collected in step 2.
Note that the given paths do not need to be pre-processed in this version.
The above stylesheet, when applied to the following "duplicate branches" input:
produces the following result:
To accomplish this in XSLT 1.0 (with possibly some small assistance by EXSLT) or 2.0, you could start by breaking each given path into itself and ancestor paths, so that:
for example, becomes:
This shouldn't be too difficult to accomplish by a named recursive template.
Once you have that in place, you can use the identity transform modified by adding a "pass-thru" parameter so that each processed element can calculate the path to itself, compare it to the given list of paths and determine whether it should join the result tree or not.
In the following stylesheet, step 1 has been skipped and the result is being used as if given.
Applied to your (corrected) input of:
the following result is obtained:
EDIT:
Note that duplicate branches may produce false positives when using a string-based test as above. For example, when applied to the following input:
the above stylesheet will produce:
If this is a problem, I will post another (more complex) XSLT 1.0 answer that eliminates the issue by testing unique ids instead.