Why is my XSLT here stripping HTML tags

2019-06-24 15:12发布

问题:

I am using XSLT 1.0 in order to convert some XML into JSON output. Unfortunately some of the XML I'm working with has HTML markup in it. Here's an example of some XML input:

 <text>
 Kevin Love and Steph Curry can talk about their first-
 time starting gigs in the All-Star game Friday night when the Minnesota
 Timberwolves visit Oracle Arena to face the Golden State Warriors.
</text>
  <continue>
    <P>
 Love and Curry were two of four first-time All-Star starters when the league
 made the announcement on Thursday.
</P>
    <P>
 Love got a late push to overtake Houston Rockets center Dwight Howard in the
 final week of voting.
</P>
    <P>
 "I think it's a little sweeter this way because I really didn't expect it,"
 Love said on a conference call. "I was already humbled by the response the
 fans gave me to being very close to the top (frontcourt players). The outreach
 by the Minnesota fans and beyond was truly amazing."
</P>
</continue>

The markup is not ideal and I need to retain the <P> tags in my JSON output. In order to deal with quotes, I escape them. Here's my template for handling this:

<xsl:variable name="escaped-continue">
      <xsl:call-template name="replace-string">
        <xsl:with-param name="text" select="continue"/>
        <xsl:with-param name="replace" select="'&quot;'" />
        <xsl:with-param name="with" select="'\&quot;'"/>
      </xsl:call-template>
    </xsl:variable>
     <xsl:variable name="escaped-text">
      <xsl:call-template name="replace-string">
        <xsl:with-param name="text" select="text"/>
        <xsl:with-param name="replace" select="'&quot;'" />
        <xsl:with-param name="with" select="'\&quot;'"/>
      </xsl:call-template>
    </xsl:variable>
 <xsl:template name="replace-string">
        <xsl:param name="text"/>
        <xsl:param name="replace"/>
        <xsl:param name="with"/>
        <xsl:choose>
            <xsl:when test="contains($text,$replace)">
                <xsl:value-of select="substring-before($text,$replace)"/>
                <xsl:value-of select="$with"/>
                <xsl:call-template name="replace-string">
                    <xsl:with-param name="text"
                        select="substring-after($text,$replace)"/>
                    <xsl:with-param name="replace" select="$replace"/>
                    <xsl:with-param name="with" select="$with"/>
                </xsl:call-template>
            </xsl:when>
            <xsl:otherwise>
                <xsl:value-of select="$text"/>
            </xsl:otherwise>
        </xsl:choose>
   </xsl:template>

I then simply use something like the following to output JSON:

{
    "text": "<xsl:value-of select="normalize-space($escaped-text)"/>", 
    "continue": "<xsl:value-of select="normalize-space($escaped-continue)"/>"
}

The issue I have here is that the output looks like this:

{
 "text": "Kevin Love and Steph Curry can talk about their first- time starting gigs in the All-Star game Friday night when the Minnesota Timberwolves visit Oracle Arena to face the Golden State Warriors.", 
  "continue": "Love and Curry were two of four first-time All-Star starters when the league made the announcement on Thursday. Love got a late push to overtake Houston Rockets center Dwight Howard in the final week of voting. \"I think it's a little sweeter this way because I really didn't expect it,\" Love said on a conference call. \"I was already humbled by the response the fans gave me to being very close to the top (frontcourt players). The outreach by the Minnesota fans and beyond was truly amazing.\"
}

As you can see, double quotes are properly escaped, however the <P> tags have been stripped and/or parsed directly by the XSLT parser and then suppressed by normalize-space(). What's the best way to re-add the <P> tags into my output here?

回答1:

Try it this way:

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" >
<xsl:output method="xml" encoding="utf-8" omit-xml-declaration="yes" />

<xsl:template match="/root">
    <xsl:text>{&#10;"text": "</xsl:text>
    <xsl:apply-templates select="text/text()"/>
    <xsl:text>"&#10;"continue": "</xsl:text>
    <xsl:apply-templates select="continue/*"/>
    <xsl:text>"&#10;}</xsl:text>
</xsl:template>

<xsl:template match="*">
    <xsl:copy>
        <xsl:apply-templates/>
    </xsl:copy>
</xsl:template>

<xsl:template match="text()">
<xsl:variable name="escaped-text">
    <xsl:call-template name="replace-string">
        <xsl:with-param name="text" select="."/>
        <xsl:with-param name="replace" select="'&quot;'" />
        <xsl:with-param name="with" select="'\&quot;'"/>
    </xsl:call-template>
</xsl:variable>
<xsl:value-of select="normalize-space($escaped-text)"/>
</xsl:template>

<xsl:template name="replace-string">
    <xsl:param name="text"/>
    <xsl:param name="replace"/>
    <xsl:param name="with"/>
    <xsl:choose>
        <xsl:when test="contains($text,$replace)">
            <xsl:value-of select="substring-before($text,$replace)"/>
            <xsl:value-of select="$with"/>
            <xsl:call-template name="replace-string">
                <xsl:with-param name="text"
                    select="substring-after($text,$replace)"/>
                <xsl:with-param name="replace" select="$replace"/>
                <xsl:with-param name="with" select="$with"/>
            </xsl:call-template>
        </xsl:when>
        <xsl:otherwise>
            <xsl:value-of select="$text"/>
        </xsl:otherwise>
    </xsl:choose>
</xsl:template>

</xsl:stylesheet>

Applied to a modified version of your input (added root element and some more markup for testing):

<root>
    <text>
    Kevin Love and Steph Curry can talk about their first-
    time starting gigs in the All-Star game Friday night when the Minnesota
    Timberwolves visit Oracle Arena to face the Golden State Warriors.
    </text>
    <continue>
        <P>
        Love and Curry were <i>two of <b>four</b> first-time All-Star</i> starters when the league
        made the announcement on Thursday.
        </P>
        <P>
        Love got a late push to overtake Houston Rockets center Dwight Howard in the
        final week of voting.
        </P>
        <P>
        "I think it's a little sweeter this way because I really didn't expect it,"
        Love said on a conference call. "I was already humbled by the response the
        fans gave me to being very close to the top (frontcourt players). The outreach
        by the Minnesota fans and beyond was truly amazing."
        </P>
    </continue>
</root>

produces the following result:

{
"text": "Kevin Love and Steph Curry can talk about their first- time starting gigs in the All-Star game Friday night when the Minnesota Timberwolves visit Oracle Arena to face the Golden State Warriors."
"continue": "<P>Love and Curry were<i>two of<b>four</b>first-time All-Star</i>starters when the league made the announcement on Thursday.</P><P>Love got a late push to overtake Houston Rockets center Dwight Howard in the final week of voting.</P><P>\"I think it's a little sweeter this way because I really didn't expect it,\" Love said on a conference call. \"I was already humbled by the response the fans gave me to being very close to the top (frontcourt players). The outreach by the Minnesota fans and beyond was truly amazing.\"</P>"
}


回答2:

That's what xsl:value-of is defined to do. If you want to retain the tags, use xsl:copy-of.



回答3:

When you pass continue as a param into text for escaped-continue you are removing the <p> tags at that step. You can either use exslt node-sets with XSLT 1.0 and handle the nodes inside the replace-string template, or rewrite your escaped-continue to parse nodes and text and only call replace-string for the text nodes.



标签: html xml json xslt