how to handle “<” and “>” in regex in xslt

2019-07-22 17:47发布

问题:

I have string in XML, <italic>a</italic> and I am using xsl:analyze-string to extract all italic words with this pattern: "<italic>a</italic>". I know I can use template match on italic but the requirement here is to match it using regex. I am trying to write the expression like this, (<italic>)[a-z]+</italic>, but the XSLT processor is throwing an error on the opening < tag.

Any idea how to handle opening and closing tags in regex?

回答1:

You haven't said what your XML source looks like, but if <italic>a</italic> is an ordinary XML element, then you can't match the lexical form of the element using regular expressions. That's because the input to XSLT is a tree of nodes, not a string of lexical XML markup. That concept is absolutely crucial to understanding how XSLT works.



回答2:

As long as <italic>a</italic> is an actual string, you can use &lt; for the < character. The greater-than (>) does not need to be escaped.

Example:

Sample XML Input

<test><![CDATA[<italic>a</italic>]]></test>

XSLT 2.0

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output indent="yes"/>
  <xsl:strip-space elements="*"/>

  <xsl:template match="node()|@*">
    <xsl:copy>
      <xsl:apply-templates select="node()|@*"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="/">
    <xsl:analyze-string select="test" regex="&lt;italic>([^&lt;]+)&lt;/italic>">
      <xsl:matching-substring>
        <results>
          <xsl:value-of select="regex-group(1)"/>
        </results>
      </xsl:matching-substring>
    </xsl:analyze-string>
  </xsl:template>

</xsl:stylesheet>

XML Output:

<results>a</results>


回答3:

<italic>a</italic> is an ordinary xml element, if you are using saxon xslt processor then use an extensions function net.sf.saxon.serialize to serialize the xml and then apply regular expression. It works great.



标签: xslt xslt-2.0