I'm using XSLT to extract some HTML content with special characters (like
) from an XML file. The content is stored in <content>
nodes. I have defined most special characters like this: <!ENTITY nbsp " ">
, so this expression works perfectly fine:
<xsl:copy-of select="content" disable-output-escaping="yes"/>
Now, I want to add target="_blank"
to every link found within that content. This is the solution I came up with:
<xsl:template match="a" mode="html">
<a>
<xsl:attribute name="href"><xsl:value-of select="@*"/></xsl:attribute>
<xsl:attribute name="target">_blank</xsl:attribute>
<xsl:apply-templates select="text()|* "/>
</a>
</xsl:template>
And instead of the "copy-of" element I use this:
<xsl:apply-templates select="content" mode="html"/>
Now all those special characters (and nbsp too) disappeared from the output. How do I keep them? Seems like disable-output-escaping="yes"
doesn't help here.
Ok, I'm using the XSLTProcessor class in PHP. The disable-output-escaping
attribute didn't give an error actually, but when I removed it, the output was the same, with all the nbsp's, so it didn't matter.
UPD. With the XSL template I have shown before, my input sample:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE page SYSTEM "html-entities.xsl">
<content>There is a non-breaking <a href="http://localhost">space</a> inside.</content>
html-entities.xsl:
<?xml version="1.0" encoding="UTF-8"?>
<!ENTITY nbsp " ">
PHP code:
$xp = new XSLTProcessor();
$xsl = new DOMDocument();
$xsl->load($xsl_filename);
$xp->importStylesheet($xsl);
$xml_doc = new DOMDocument();
$xml_doc->resolveExternals = true;
$xml_doc->load($xml_filename);
$html = $xp->transformToXML($xml_doc);
My current output:
There is anon-breaking <a href="http://localhost" target="_blank">space</a> inside.
My desired output:
There is a non-breaking <a href="http://localhost" target="_blank">space</a> inside.
Basically whether the source code of the input XML document has a character reference like
 
or an entity reference like
or such a character literally does not matter to XSLT and does not make a difference how the input is processed and how the output looks; basically XSLT operates on a tree with Unicode characters stored in text nodes. At least that is the theory, your PHP code seems to work with a DOM tree model which might store entity reference nodes but even then for XSLT that shouldn't matter. In the input tree there should be text nodes containing Unicode characters (one if which could be the non-breaking space character with Unicode 160) and if you copy such a text to the output the result tree has a text node with the same Unicode characters.For the output method
html
some XSLT processors (Saxon 6.5.5 for instance) might do you the favour to ensure characters defined as entities in HTML are serialized with the corresponding entity reference but even if they don't do that the serialization of the result tree should be a file with the proper Unicode characters, encoded as directed by theencoding
attribute of thexsl:output
element.Your current result which completely drops the character (e.g.
There is anon-breaking
) does not make sense to me.