How to use XSLT 1.0 or XPath to manipulate an HTML

This is my problem: The code snippet below (inside the <xsl:choose>) does not reliably strip , <div> or   tags out of a string using a combination of the substring-before() and substring() functions.

The string I'm trying to format is an attribute of a SharePoint SPS 2003 list item - text inputted via a rich text editor. What I ideally need is a catch-all <xsl:when> test that will always just grab the text within the string before a line break (effectively the first paragraph). I thought that:

<xsl:when test="contains(Story, '&#x0a;')='True'">

Would do that, but it doesn't always work as although the rich text editor inserts   and  tags, it appears that these are not always represented by the 
 value.

Please help - this is driving me nuts. Code:

<xsl:choose>
  <xsl:when test="contains(Story, '&#x0a;')">
    <div>PTAG_OPEN_OR_BR<xsl:value-of select="substring-before(Story,'&#x0a;')" disable-output-escaping="yes"/></div>
  </xsl:when>
  <xsl:when test="contains(Story, '&#x0a;') and contains(Story, 'div>')">
    <div>DTAG<xsl:value-of select="substring-before(substring-after(substring-before(Story, '/div>'), 'div>'),'&#x0a;')" disable-output-escaping="yes"/></div>
  </xsl:when>
  <xsl:when test="contains(Story, '&#x0a;')!='True' and contains(Story, 'br>')">
    <div>BRTAG<xsl:value-of select="substring(Story, 1, string-length(substring-before(Story, 'br>')-1))" disable-output-escaping="yes"/></div>
  </xsl:when>            
  <xsl:otherwise>
    <div>NO_TAG<xsl:value-of select="substring(Story, 1, 150)" disable-output-escaping="yes"/></div>
  </xsl:otherwise>
</xsl:choose>

EDIT:

Will try out your suggestion Tomalak. Thank you.

EDIT: 12/11/09

Only just had chance to try this out. Thanks for your help Tomalak - I have one question in regard to rendering this as html rather than xml. when I call the template removeMarkup, I get the following error message:

Exception: System.Xml.XmlException Message: '<', hexadecimal value 0x3C, is an invalid attribute character. Line 120, position 58.

I'm not sure but I believe that this is because you can't have xslt tags inside other attributes? Is there any way around this?

Thanks Tim

标签： sharepoint xslt xpath

2条回答

Summer. ? 凉城

2楼-- · 2019-08-19 01:52

A  or   is very probably represented by a  or   by the editor, not by 
. ;-)

Line break characters are not required anywhere in HTML, so if the editor decides not to include any line breaks, it's still fine. Relying on line breaks is an error on your part, IMHO.

Apart from that, without sample XML it is anybody's guess what XPath might do the trick for you.

EDIT:

I suggest a template that removes any HTML markup from a string (by recursive string processing). Then you can take the first meaningful bit of text from the result and print it out.

With this input:

<test>
  <Story>&lt;div&gt;&lt;p&gt;The quick brown fox jumped over the lazy dog&lt;/p&gt;&lt;p&gt;The quick brown fox jumped over the lazy dog&lt;/p&gt;&lt;/div&gt;</Story>
  <Story>&lt;div&gt;&lt;p&gt;The quick brown fox jumped over the lazy dog&lt;/p&gt;&lt;p&gt;The quick brown fox jumped over the lazy dog&lt;/p&gt;&lt;/div&gt;</Story>
  <Story>The quick brown fox jumped over the lazy dog.&lt;br&gt;The quick brown fox jumped over the lazy dog.</Story>
  <Story>The quick brown fox jumped over the lazy dog.</Story>
</test>

and this stylesheet:

<xsl:stylesheet
  version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>
  <xsl:output method="xml" encoding="utf-8" />

  <xsl:template match="Story">
    <xsl:copy>
      <original>
        <xsl:value-of select="." />
      </original>
      <processed>
        <xsl:variable name="result">
          <xsl:call-template name="removeMarkup">
            <xsl:with-param name="html" select="." />
          </xsl:call-template>
        </xsl:variable>
        <!-- select the bit of text before the '<>' delimiter -->
        <xsl:value-of select="substring-before($result, '&lt;&gt;')" />
      </processed>
    </xsl:copy>
  </xsl:template>

  <!-- this template removes all HTML markup (tags) from a string -->
  <xsl:template name="removeMarkup">
    <xsl:param name="html"  select="''" />
    <xsl:param name="inTag" select="false()" />

    <!-- if we are in a tag, we look for the next '>', otherwise for '<' -->    
    <xsl:variable name="lookFor">
      <xsl:choose>
        <xsl:when test="$inTag">&gt;</xsl:when>
        <xsl:otherwise>&lt;</xsl:otherwise>
      </xsl:choose>
    </xsl:variable>

    <!-- split the input at the current delimiter char -->
    <xsl:variable name="head" select="substring-before(concat($html, '&lt;'), $lookFor)" />
    <xsl:variable name="tail" select="substring-after($html, $lookFor)" />

    <xsl:if test="not($inTag)">
      <xsl:value-of select="$head" />
      <!-- now add a uniqe delimiter after the first actual text -->
      <xsl:if test="translate(normalize-space($head), ' ', '') != ''">
        <xsl:value-of select="'&lt;&gt;'" /> <!-- '<>' as a delimiter -->
      </xsl:if>
    </xsl:if>

    <!-- remove markup for the rest of the string -->
    <xsl:if test="$tail != ''">
      <xsl:call-template name="removeMarkup">
        <xsl:with-param name="html"  select="$tail" />
        <xsl:with-param name="inTag" select="not($inTag)" />
      </xsl:call-template>
    </xsl:if>
  </xsl:template>

</xsl:stylesheet>

the following result is produced:

<Story>
  <original>&lt;div&gt;&lt;p&gt;The quick brown fox jumped over the lazy dog&lt;/p&gt;&lt;p&gt;The quick brown fox jumped over the lazy dog&lt;/p&gt;&lt;/div&gt;</original>
  <processed>The quick brown fox jumped over the lazy dog</processed>
</Story>
<Story>
  <original>&lt;div&gt;&lt;p&gt;The quick brown fox jumped over the lazy dog&lt;/p&gt;&lt;p&gt;The quick brown fox jumped over the lazy dog&lt;/p&gt;&lt;/div&gt;</original>
  <processed>The quick brown fox jumped over the lazy dog</processed>
</Story>
<Story>
  <original>The quick brown fox jumped over the lazy dog.&lt;br&gt;The quick brown fox jumped over the lazy dog.</original>
  <processed>The quick brown fox jumped over the lazy dog.</processed>
</Story>
<Story>
  <original>The quick brown fox jumped over the lazy dog.</original>
  <processed>The quick brown fox jumped over the lazy dog.</processed>
</Story>

Disclaimer: As with all string processing over HTML input, this is not 100% fool proof and certain malformed input can break it.

0人赞添加讨论(0) 举报

叼着烟拽天下

3楼-- · 2019-08-19 01:55

contains() returns a boolean value, so contains(Story, ' ')='True' implies a casting operation. W3C XSLT specification is unclear about casting priority in comparison of string with boolean, so some XSLT processors will cast the boolean to string, and others will cast string to boolean. In the second case, string(True()) returns 'true' and not 'True'.

Anyway, your test is redundant, just use the boolean value returned by contains():

<xsl:when test="contains(Story, '&#x0a;')">

0人赞添加讨论(0) 举报

How to use XSLT 1.0 or XPath to manipulate an HTML

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间