This is my problem: The code snippet below (inside the <xsl:choose>
) does not reliably strip <p>
, <div>
or <br>
tags out of a string using a combination of the substring-before()
and substring()
functions.
The string I'm trying to format is an attribute of a SharePoint SPS 2003 list item - text inputted via a rich text editor. What I ideally need is a catch-all <xsl:when>
test that will always just grab the text within the string before a line break (effectively the first paragraph). I thought that:
<xsl:when test="contains(Story, '
')='True'">
Would do that, but it doesn't always work as although the rich text editor inserts <br>
and <p>
tags, it appears that these are not always represented by the 

value.
Please help - this is driving me nuts. Code:
<xsl:choose>
<xsl:when test="contains(Story, '
')">
<div>PTAG_OPEN_OR_BR<xsl:value-of select="substring-before(Story,'
')" disable-output-escaping="yes"/></div>
</xsl:when>
<xsl:when test="contains(Story, '
') and contains(Story, 'div>')">
<div>DTAG<xsl:value-of select="substring-before(substring-after(substring-before(Story, '/div>'), 'div>'),'
')" disable-output-escaping="yes"/></div>
</xsl:when>
<xsl:when test="contains(Story, '
')!='True' and contains(Story, 'br>')">
<div>BRTAG<xsl:value-of select="substring(Story, 1, string-length(substring-before(Story, 'br>')-1))" disable-output-escaping="yes"/></div>
</xsl:when>
<xsl:otherwise>
<div>NO_TAG<xsl:value-of select="substring(Story, 1, 150)" disable-output-escaping="yes"/></div>
</xsl:otherwise>
</xsl:choose>
EDIT:
Will try out your suggestion Tomalak. Thank you.
EDIT: 12/11/09
Only just had chance to try this out. Thanks for your help Tomalak - I have one question in regard to rendering this as html rather than xml. when I call the template removeMarkup, I get the following error message:
Exception: System.Xml.XmlException Message: '<', hexadecimal value 0x3C, is an invalid attribute character. Line 120, position 58.
I'm not sure but I believe that this is because you can't have xslt tags inside other attributes? Is there any way around this?
Thanks Tim
A
<p>
or<br>
is very probably represented by a<p>
or<br>
by the editor, not by

. ;-)Line break characters are not required anywhere in HTML, so if the editor decides not to include any line breaks, it's still fine. Relying on line breaks is an error on your part, IMHO.
Apart from that, without sample XML it is anybody's guess what XPath might do the trick for you.
EDIT:
I suggest a template that removes any HTML markup from a string (by recursive string processing). Then you can take the first meaningful bit of text from the result and print it out.
With this input:
and this stylesheet:
the following result is produced:
Disclaimer: As with all string processing over HTML input, this is not 100% fool proof and certain malformed input can break it.
contains() returns a boolean value, so contains(Story, ' ')='True' implies a casting operation. W3C XSLT specification is unclear about casting priority in comparison of string with boolean, so some XSLT processors will cast the boolean to string, and others will cast string to boolean. In the second case, string(True()) returns 'true' and not 'True'.
Anyway, your test is redundant, just use the boolean value returned by contains():