convert character if codepoint within given range

2019-07-07 05:46发布

问题:

I have a couple of XML files that contain unicode characters with codepoint values between 57600 and 58607. Currently these are shown in my content as square blocks and I'd like to convert these to elements.

So what I'd like to achieve is something like :

<!-- current input -->
<p> Follow the on-screen instructions.</p>  
<!-- desired output-->
<p><unichar value="58208"/> Follow the on-screen instructions.</p>
<!-- Where 58208 is the actual codepoint of the unicode character in question -->

I've fooled around a bit with tokenizer but as you need to have reference to split upon this turned out to be over complicated.

Any advice on how to tackle this best ? I've been trying some things like below but got struck (don't mind the syntax, I know it doesn't make any sense)

<xsl:template match="text()">
 -> for every character in my string
    -> if string-to-codepoints(current character) greater then 57600 return <unichar value="codepoint value"/>
       else return character
</xsl:template>

回答1:

It sounds like a job for analyze-string e.g.

<xsl:template match="text()">
  <xsl:analyze-string select="." regex="[&#57600;-&#58607;]">
    <xsl:matching-substring>
       <unichar value="{string-to-codepoints(.)}"/>
    </xsl:matching-substring>
    <xsl:non-matching-substring>
      <xsl:value-of select="."/>
    </xsl:non-matching-substring>
  </xsl:analyze-string>
</xsl:template>

Untested.



回答2:

This transformation:

<xsl:stylesheet version="2.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output omit-xml-declaration="yes"/>

 <xsl:template match="/*">
     <p>
      <xsl:for-each select="string-to-codepoints(.)">
        <xsl:choose>
            <xsl:when test=". > 57600">
              <unichar value="{.}"/>
            </xsl:when>
            <xsl:otherwise>
              <xsl:value-of select="codepoints-to-string(.)"/>
            </xsl:otherwise>
        </xsl:choose>
      </xsl:for-each>
     </p>
 </xsl:template>
</xsl:stylesheet>

when applied on the provided XML document:

<p> Follow the on-screen instructions.</p>

produces the wanted, correct result:

<p><unichar value="58498"/> Follow the on-screen instructions.</p>

Explanation: Proper use of the standard XPath 2.0 functions string-to-codepoints() and codepoints-to-string().



标签: xslt xslt-2.0