I am wondering if is possible to create an XSLT stylesheet that would extract XPATHs for all leaf elements in a given XML file.
E.g. for
<?xml version="1.0" encoding="UTF-8"?>
<root>
<item1>value1</item1>
<subitem>
<item2>value2</item2>
</subitem>
</root>
The output would be
/root/item1
/root/subitem/item2
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" indent="no" />
<xsl:template match="*[not(*)]">
<xsl:for-each select="ancestor-or-self::*">
<xsl:value-of select="concat('/', name())"/>
<xsl:if test="count(preceding-sibling::*[name() = name(current())]) != 0">
<xsl:value-of select="concat('[', count(preceding-sibling::*[name() = name(current())]) + 1, ']')"/>
</xsl:if>
</xsl:for-each>
<xsl:text>
</xsl:text>
<xsl:apply-templates select="*"/>
</xsl:template>
<xsl:template match="*">
<xsl:apply-templates select="*"/>
</xsl:template>
</xsl:stylesheet>
outputs:
/root/item1
/root/subitem/item2
This transformation:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:variable name="vApos">'</xsl:variable>
<xsl:template match="*[@* or not(*)] ">
<xsl:if test="not(*)">
<xsl:apply-templates select="ancestor-or-self::*" mode="path"/>
<xsl:text>
</xsl:text>
</xsl:if>
<xsl:apply-templates select="@*|*"/>
</xsl:template>
<xsl:template match="*" mode="path">
<xsl:value-of select="concat('/',name())"/>
<xsl:variable name="vnumSiblings" select=
"count(../*[name()=name(current())])"/>
<xsl:if test="$vnumSiblings > 1">
<xsl:value-of select=
"concat('[',
count(preceding-sibling::*
[name()=name(current())]) +1,
']')"/>
</xsl:if>
</xsl:template>
<xsl:template match="@*">
<xsl:apply-templates select="../ancestor-or-self::*" mode="path"/>
<xsl:value-of select="concat('[@',name(), '=',$vApos,.,$vApos,']')"/>
<xsl:text>
</xsl:text>
</xsl:template>
</xsl:stylesheet>
when applied on the provided XML document:
<root>
<item1>value1</item1>
<subitem>
<item2>value2</item2>
</subitem>
</root>
produces the wanted, correct result:
/root/item1
/root/subitem/item2
With this XML document:
<root>
<item1>value1</item1>
<subitem>
<item>value2</item>
<item>value3</item>
</subitem>
</root>
it correctly produces:
/root/item1
/root/subitem/item[1]
/root/subitem/item[2]
See also this related answer: https://stackoverflow.com/a/4747858/36305
I think the following correction only matters in unusual cases where different prefixes are used for the same namespaces, or different namespaces for the same prefix, among sibling elements in a document. However there is nothing theoretically wrong with such input, and it could be common in certain kinds of generated XML.
Anyway, the following answer fixes that case (copied-and-modified from @Kirill's answer):
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" indent="no" />
<xsl:template match="*[not(*)]">
<xsl:for-each select="ancestor-or-self::*">
<xsl:value-of select="concat('/', name())"/>
<!-- Suggestions on how to refactor the repetition of long XPath
expression parts are welcome. -->
<xsl:if test="count(../*[local-name() = local-name(current())
and namespace-uri(.) = namespace-uri(current())]) > 1">
<xsl:value-of select="concat('[', count(
preceding-sibling::*[local-name() = local-name(current())
and namespace-uri(.) = namespace-uri(current())]) + 1, ']')"/>
</xsl:if>
</xsl:for-each>
<xsl:text>
</xsl:text>
<xsl:apply-templates select="*"/>
</xsl:template>
<xsl:template match="*">
<xsl:apply-templates select="*"/>
</xsl:template>
</xsl:stylesheet>
It also addresses the problem in other answers where elements that are first in a series of siblings lack a position predicate.
E.g. for the input
<root>
<item1>value1</item1>
<subitem>
<a:item xmlns:a="uri">value2</a:item>
<b:item xmlns:b="uri">value3</b:item>
</subitem>
</root>
this answer produces
/root/item1
/root/subitem/a:item[1]
/root/subitem/b:item[2]
which is correct.
However, like all XPath expressions, these will only work if the environment using them specifies correct bindings for the namespace prefixes used. In theory there can be more pathological documents for which the above answer generates XPath expressions that can never work (in XPath 1.0 at least) regardless of the prefix bindings. E.g. this input:
<root>
<item1>value1</item1>
<a:subitem xmlns:a="differentURI">
<a:item xmlns:a="uri">value2</a:item>
<b:item xmlns:b="uri">value3</b:item>
</a:subitem>
</root>
produces the output
/root/item1
/root/a:subitem/a:item[1]
/root/a:subitem/b:item[2]
But the second XPath expression here can never work, since the prefix a
refers to two different namespaces in the same expression.
Well you can find leaf elements with //*[not(*)]
and of course you can for-each
the ancestor-or-self axis then to output the path. But once you have namespaces involved generating XPath expressions becomes complicated.