How to implement XSLT tokenize function?

2019-06-25 09:45发布

问题:

It seems like EXSLT tokenize function is not available with PHP XSLTProcessor (XSLT 1.0).

I tried to implement it in pure XSL but I can't make it work :

<xsl:stylesheet
    version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:func="http://exslt.org/functions"
    xmlns:exsl="http://exslt.org/common"
    xmlns:my="http://mydomain.com/">

    <func:function name="my:tokenize">
        <xsl:param name="string"/>
        <xsl:param name="separator" select="'|'"/>
        <xsl:variable name="item" select="substring-before(concat($string,$separator),$separator)"/>
        <xsl:variable name="remainder" select="substring-after($string,$separator)"/>
        <xsl:variable name="tokens">
            <token><xsl:value-of select="$item"/></token>
            <xsl:if test="$remainder!=''">
                <xsl:copy-of select="my:tokenize($remainder,$separator)"/>
            </xsl:if>
        </xsl:variable>
        <func:result select="exsl:node-set($tokens)"/>
    </func:function>

    <xsl:template match="/">
        <xsl:copy-of select="my:tokenize('a|b|c')"/>
    </xsl:template>

</xsl:stylesheet>

Expected result :

    <token>a</token><token>b</token><token>c</token>

Actual result :

    abc

I know this question has been posted many times but I can't find a simple solution.

Thank you for your help.

回答1:

Quoting http://www.exslt.org/str/functions/tokenize/index.html

The following XSLT processors support str:tokenize:

  • 4XSLT, from 4Suite. (version 0.12.0a3)
  • libxslt from Daniel Veillard et al. (version 1.0.19)

Since PHP uses libxslt, it means tokenize is available, but you have to use the right extension namespaces (which you dont do):

<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:str="http://exslt.org/strings"
    extension-element-prefixes="str"
    …

Then you can use tokenize as a function, for example to build a select box with numbers 1-12:

<select name="months">
    <xsl:for-each select="str:tokenize('1,2,3,4,5,6,7,8,9,10,11,12', ',')">
        <xsl:element name="option">
            <xsl:attribute name="value">
                <xsl:value-of select="."/>
            </xsl:attribute>
            <xsl:value-of select="."/>
        </xsl:element>
    </xsl:for-each>
</select>


回答2:

I may be a bit old-fashioned since I don't use functions, but I have the following tokenize template, which does what you want without any special extensions:

<xsl:template name="tokenize">
  <xsl:param name="string"/>
  <xsl:param name="separator" select="'|'"/>

  <xsl:choose>
    <xsl:when test="contains($string,$separator)">
      <token>
        <xsl:value-of select="substring-before($string,$separator)"/>
      </token>
      <xsl:call-template name="tokenize">
        <xsl:with-param name="string" select="substring-after($string,$separator)"/>
      </xsl:call-template>
    </xsl:when>
    <xsl:otherwise>
      <token><xsl:value-of select="$string"/></token>
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>

It gets called as follows and should give you the desired output:

<xsl:call-template name="tokenize">
  <xsl:with-param name="string" select="'a|b|c'"/>
</xsl:call-template>


回答3:

You don't have to write your own implementation -- just use the existing FXSL str-split-to-words template, which provides even more powerful functionality.

Here is a short Demo of using str-split-to-words:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:ext="http://exslt.org/common">

   <xsl:import href="strSplit-to-Words.xsl"/>
   <xsl:output indent="yes" omit-xml-declaration="yes"/>

    <xsl:template match="/">
      <xsl:variable name="vwordNodes">
        <xsl:call-template name="str-split-to-words">
          <xsl:with-param name="pStr" select="/"/>
          <xsl:with-param name="pDelimiters" 
                          select="', &#9;&#10;&#13;'"/>
        </xsl:call-template>
      </xsl:variable>

      <xsl:apply-templates select="ext:node-set($vwordNodes)/*"/>
    </xsl:template>

    <xsl:template match="word">
      <xsl:value-of select="concat(position(), ' ', ., '&#10;')"/>
    </xsl:template>
</xsl:stylesheet>

when this transformation is applied on the following XML document:

<t>out, of
 luck</t>

the wanted result is produced -- a sequence of all words with their positions.

Do note that any maximum-length sequence of adjacent delimiter characters that are provided in the pDelimiters parameter is used as a delimiter:

1 out
2 of
3 luck