可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I am trying to use XSLT 2.0 (Saxon-PE 9.6) on an HTML document to create tags that surround all contiguous runs of characters from a specified non-Latin Unicode block (spaces allowed). I need to apply this process to every text() node in the document. I have made some progress with two approaches that use <xsl:analyze-string> and using fn:replace() but I've not been able to arrive at a satisfactory and complete solution.

For example, here is some text containing Hindi:

Input: चाय का कप means ‘cup of tea’ in हिन्दि.

Desired Output: चाय का कप means ‘cup of tea’ in हिन्दि.

How can this process be implemented in XSLT 2.0?

Here's my attempt with <xsl:analyze-string>:

(Note: the Hindi language uses the Devanagari code block U+0900 to U+097F.)

<xsl:template match="text()">
  <xsl:variable name="textValue" select="."/>

  <xsl:analyze-string select="$textValue" regex="(\s*.*?)([&#x0900;-&#x097f;]+)((\s+[&#x0900;-&#x097f;]+)*)(\s*.*)">

    <xsl:matching-substring>
      <xsl:value-of select="regex-group(1)"/>
      <span xml:lang="hi-Deva"><xsl:value-of select="regex-group(2)"/><xsl:value-of select="regex-group(3)"/></span>
      <xsl:value-of select="regex-group(5)"/>
    </xsl:matching-substring>

    <xsl:non-matching-substring>
      <xsl:value-of select="$textValue"/>
    </xsl:non-matching-substring>

  </xsl:analyze-string>
</xsl:template>

On the test input, this produces: चाय का कप means ‘cup of tea’ in हिन्दि. This approach misses the second region of Hindi text (हिन्दि). I need an approach that will find and tag all occurrences matched by the regex.

My second approach used fn:replace():

<xsl:template match="text()">
  <xsl:value-of select='fn:replace(., "[&#x0900;-&#x097f;]+(\s+[&#x0900;-&#x097f;]+)*", "xxx$0xxx")'/>
</xsl:template>

On the test input this produces: xxxचाय का कपxxx means ‘cup of tea’ in xxxहिन्दिxxx. This is clearly incorrect, since the Hindi is wrapped in xxx’s, not span tags, but on the positive side, each region of Hindi is in fact discovered and processed. I cannot replace the xxx code with span tags because that is invalid XSLT.

回答1:

I came up with http://xsltransform.net/jyH9rMo which just does

<?xml version="1.0" encoding="UTF-8" ?>
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
    <xsl:output method="html" doctype-public="XSLT-compat" omit-xml-declaration="yes" encoding="UTF-8" indent="yes" />

    <xsl:template match="/">
      <hmtl>
        <head>
          <title>New Version!</title>
        </head>
        <xsl:apply-templates/>
      </hmtl>
    </xsl:template>

    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="text()">
   <xsl:analyze-string select="." regex="([&#x0900;-&#x097f;]+)((\s+[&#x0900;-&#x097f;]+)*)">

    <xsl:matching-substring>
      <span xml:lang="hi-Deva"><xsl:value-of select="."/></span>
    </xsl:matching-substring>

    <xsl:non-matching-substring>
      <xsl:value-of select="."/>
    </xsl:non-matching-substring>

  </xsl:analyze-string>       
    </xsl:template>
</xsl:transform>

回答2:

This should work (some comments after the code):

XSLT 2.0

<xsl:analyze-string select="$textValue" regex="([&#x0900;-&#x097f;]+)((\s+[&#x0900;-&#x097f;]+)*)">
    <xsl:matching-substring>
          <span xml:lang="hi-Deva"><xsl:value-of select="regex-group(1)"/><xsl:value-of select="regex-group(2)"/></span>
    </xsl:matching-substring>
    <xsl:non-matching-substring>
          <xsl:value-of select="."/>
    </xsl:non-matching-substring>
</xsl:analyze-string>

the regex is the one from your second try (as it was correctly matching only the Hindi text fragments!), just with parentheses around the first part
the matching-substring branch puts the span around the Hindi text
the non-matching-substring branch just returns the unmodified "normal" text substring (you were returning the whole text!)