I have string in XML, <italic>a</italic>
and I am using xsl:analyze-string
to extract all italic words with this pattern: "<italic>a</italic>"
. I know I can use template match on italic but the requirement here is to match it using regex. I am trying to write the expression like this, (<italic>)[a-z]+</italic>
, but the XSLT processor is throwing an error on the opening <
tag.
Any idea how to handle opening and closing tags in regex?
You haven't said what your XML source looks like, but if <italic>a</italic>
is an ordinary XML element, then you can't match the lexical form of the element using regular expressions. That's because the input to XSLT is a tree of nodes, not a string of lexical XML markup. That concept is absolutely crucial to understanding how XSLT works.
As long as <italic>a</italic>
is an actual string, you can use <
for the < character. The greater-than (>) does not need to be escaped.
Example:
Sample XML Input
<test><![CDATA[<italic>a</italic>]]></test>
XSLT 2.0
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="/">
<xsl:analyze-string select="test" regex="<italic>([^<]+)</italic>">
<xsl:matching-substring>
<results>
<xsl:value-of select="regex-group(1)"/>
</results>
</xsl:matching-substring>
</xsl:analyze-string>
</xsl:template>
</xsl:stylesheet>
XML Output:
<results>a</results>
<italic>a</italic>
is an ordinary xml element, if you are using saxon xslt processor then use an extensions function net.sf.saxon.serialize to serialize the xml and then apply regular expression. It works great.