I am using XSLT 1.0 in order to convert some XML into JSON output. Unfortunately some of the XML I'm working with has HTML markup in it. Here's an example of some XML input:
<text>
Kevin Love and Steph Curry can talk about their first-
time starting gigs in the All-Star game Friday night when the Minnesota
Timberwolves visit Oracle Arena to face the Golden State Warriors.
</text>
<continue>
<P>
Love and Curry were two of four first-time All-Star starters when the league
made the announcement on Thursday.
</P>
<P>
Love got a late push to overtake Houston Rockets center Dwight Howard in the
final week of voting.
</P>
<P>
"I think it's a little sweeter this way because I really didn't expect it,"
Love said on a conference call. "I was already humbled by the response the
fans gave me to being very close to the top (frontcourt players). The outreach
by the Minnesota fans and beyond was truly amazing."
</P>
</continue>
The markup is not ideal and I need to retain the <P>
tags in my JSON output. In order to deal with quotes, I escape them. Here's my template for handling this:
<xsl:variable name="escaped-continue">
<xsl:call-template name="replace-string">
<xsl:with-param name="text" select="continue"/>
<xsl:with-param name="replace" select="'"'" />
<xsl:with-param name="with" select="'\"'"/>
</xsl:call-template>
</xsl:variable>
<xsl:variable name="escaped-text">
<xsl:call-template name="replace-string">
<xsl:with-param name="text" select="text"/>
<xsl:with-param name="replace" select="'"'" />
<xsl:with-param name="with" select="'\"'"/>
</xsl:call-template>
</xsl:variable>
<xsl:template name="replace-string">
<xsl:param name="text"/>
<xsl:param name="replace"/>
<xsl:param name="with"/>
<xsl:choose>
<xsl:when test="contains($text,$replace)">
<xsl:value-of select="substring-before($text,$replace)"/>
<xsl:value-of select="$with"/>
<xsl:call-template name="replace-string">
<xsl:with-param name="text"
select="substring-after($text,$replace)"/>
<xsl:with-param name="replace" select="$replace"/>
<xsl:with-param name="with" select="$with"/>
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$text"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
I then simply use something like the following to output JSON:
{
"text": "<xsl:value-of select="normalize-space($escaped-text)"/>",
"continue": "<xsl:value-of select="normalize-space($escaped-continue)"/>"
}
The issue I have here is that the output looks like this:
{
"text": "Kevin Love and Steph Curry can talk about their first- time starting gigs in the All-Star game Friday night when the Minnesota Timberwolves visit Oracle Arena to face the Golden State Warriors.",
"continue": "Love and Curry were two of four first-time All-Star starters when the league made the announcement on Thursday. Love got a late push to overtake Houston Rockets center Dwight Howard in the final week of voting. \"I think it's a little sweeter this way because I really didn't expect it,\" Love said on a conference call. \"I was already humbled by the response the fans gave me to being very close to the top (frontcourt players). The outreach by the Minnesota fans and beyond was truly amazing.\"
}
As you can see, double quotes are properly escaped, however the <P>
tags have been stripped and/or parsed directly by the XSLT parser and then suppressed by normalize-space()
. What's the best way to re-add the <P>
tags into my output here?
Try it this way:
Applied to a modified version of your input (added root element and some more markup for testing):
produces the following result:
When you pass
continue
as a param into text forescaped-continue
you are removing the<p>
tags at that step. You can either use exslt node-sets with XSLT 1.0 and handle the nodes inside thereplace-string
template, or rewrite yourescaped-continue
to parse nodes and text and only callreplace-string
for the text nodes.That's what xsl:value-of is defined to do. If you want to retain the tags, use xsl:copy-of.