I have an XML document I generate on the fly, and I need a function to eliminate any duplicate nodes from it.
My function looks like:
declare function local:start2() {
let $data := local:scan_books()
return <books>{$data}</books>
};
Sample output is:
<books>
<book>
<title>XML in 24 hours</title>
<author>Some Guy</author>
</book>
<book>
<title>XML in 24 hours</title>
<author>Some Guy</author>
</book>
</books>
I want just the one entry in my books root tag, and there are other tags, like say pamphlet in there too that need to have duplicates removed. Any ideas?
Updated following comments. By unique nodes, I mean remove multiple occurrences of nodes that have the exact same content and structure.
To remove duplicates I usually use a helper function. In your case it'll look like that:
A simpler and more direct one-liner XPath solution:
Just use the following XPath expression:
When applied, for example, on the following XML document:
the above XPath expression selects correctly the following nodes:
The explanation is simple: For every
book
, select only one of its occurences -- such that its index in all-books is the same as the first index of itstitle
in all-titles.You can use this functx function: functx:distinct-deep
No need reinvent the wheel
What about fn:distinct-values?
A solution inspired by functional programming. This solution is extensible in that you can replace the "=" comparison by your custom-built boolean
local:compare($element1, $element2)
function. This function has worst-case quadratic complexity in the length of the list. You could getn(log n)
complexity by sorting the list before-hand and only comparing with the immediate successor.To my best knowledge, the
fn:distinct-values
(orfn:distinct-elements
) functions does not allow to use a custom-built comparison function.I solved my problem by implementing a recursive uniqueness search function, based solely on the text content of my document for uniqueness matching.
Called as follows:
output:
I guess if you need slightly different equivalence matching, you can alter the matching in the algorithm accordingly. Should get you started at any rate.