Keep   and other special characters in XSLT o

2019-04-10 08:11发布

I'm using XSLT to extract some HTML content with special characters (like &nbsp;) from an XML file. The content is stored in <content> nodes. I have defined most special characters like this: <!ENTITY nbsp "&#160;">, so this expression works perfectly fine:

<xsl:copy-of select="content" disable-output-escaping="yes"/>

Now, I want to add target="_blank" to every link found within that content. This is the solution I came up with:

<xsl:template match="a" mode="html">
    <a>
        <xsl:attribute name="href"><xsl:value-of select="@*"/></xsl:attribute>
        <xsl:attribute name="target">_blank</xsl:attribute>
        <xsl:apply-templates select="text()|* "/>
    </a>
</xsl:template>

And instead of the "copy-of" element I use this:

<xsl:apply-templates select="content" mode="html"/>

Now all those special characters (and nbsp too) disappeared from the output. How do I keep them? Seems like disable-output-escaping="yes" doesn't help here.

Ok, I'm using the XSLTProcessor class in PHP. The disable-output-escaping attribute didn't give an error actually, but when I removed it, the output was the same, with all the nbsp's, so it didn't matter.


UPD. With the XSL template I have shown before, my input sample:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE page SYSTEM "html-entities.xsl">
<content>There is a&nbsp;non-breaking <a href="http://localhost">space</a> inside.</content>

html-entities.xsl:

<?xml version="1.0" encoding="UTF-8"?>
<!ENTITY nbsp "&#160;">

PHP code:

$xp = new XSLTProcessor();
$xsl = new DOMDocument();
$xsl->load($xsl_filename);
$xp->importStylesheet($xsl);
$xml_doc = new DOMDocument();
$xml_doc->resolveExternals = true;
$xml_doc->load($xml_filename);
$html = $xp->transformToXML($xml_doc);

My current output:

There is anon-breaking <a href="http://localhost" target="_blank">space</a> inside.

My desired output:

There is a&nbsp;non-breaking <a href="http://localhost" target="_blank">space</a> inside.

1条回答
老娘就宠你
2楼-- · 2019-04-10 08:28

Basically whether the source code of the input XML document has a character reference like &#160; or an entity reference like &nbsp; or such a character literally does not matter to XSLT and does not make a difference how the input is processed and how the output looks; basically XSLT operates on a tree with Unicode characters stored in text nodes. At least that is the theory, your PHP code seems to work with a DOM tree model which might store entity reference nodes but even then for XSLT that shouldn't matter. In the input tree there should be text nodes containing Unicode characters (one if which could be the non-breaking space character with Unicode 160) and if you copy such a text to the output the result tree has a text node with the same Unicode characters.

For the output method html some XSLT processors (Saxon 6.5.5 for instance) might do you the favour to ensure characters defined as entities in HTML are serialized with the corresponding entity reference but even if they don't do that the serialization of the result tree should be a file with the proper Unicode characters, encoded as directed by the encoding attribute of the xsl:output element.

Your current result which completely drops the character (e.g. There is anon-breaking) does not make sense to me.

查看更多
登录 后发表回答