Parsing html with xslt

2019-08-18 07:52发布

Can someone help me take the following:

<rss xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0">
  <channel>
    <title>This is a test</title>
    <link>http://somelink.html</link>
    <description>RSS Feed</description>
      <item>
      <title>This is a title</title>
      <link>http://somelink.html</link>
      <description>&lt;div style='font-size: 9px;'&gt;&lt;div class="rendering rendering_researchoutput  rendering_researchoutput_short rendering_contributiontojournal rendering_short rendering_contributiontojournal_short"&gt;&lt;h2 class="title"&gt;&lt;a class="link" rel="ContributionToJournal" href="http://somelink.html"&gt;&lt;span&gt;This is a Title&lt;/span&gt;&lt;/a&gt;&lt;/h2&gt;&lt;a class="link person" rel="Person" href="somelink.html"&gt;&lt;span&gt;Bob, C. R&lt;/span&gt;&lt;/a&gt; &amp;amp; Smith, W. &lt;span class="date"&gt;2014&lt;/span&gt; &lt;span class="journal"&gt;In : &lt;a class="link" rel="Journal" href="http://somelink.html"&gt;&lt;span&gt;Publishers title&lt;/span&gt;&lt;/a&gt;.&lt;/span&gt;&lt;p class="type"&gt;&lt;span class="type_family"&gt;Research output&lt;span class="type_family_sep"&gt;: &lt;/span&gt;&lt;/span&gt;&lt;span class="type_classification_parent"&gt;Contribution to journal&lt;span class="type_parent_sep"&gt; › &lt;/span&gt;&lt;/span&gt;&lt;span class="type_classification"&gt;Article&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;&lt;div class="rendering rendering_researchoutput  rendering_researchoutput_detailsportal rendering_contributiontojournal rendering_detailsportal rendering_contributiontojournal_detailsportal"&gt;&lt;div class="article"&gt;&lt;table class="properties"&gt;&lt;tbody&gt;&lt;tr class="language"&gt;&lt;th&gt;Original language&lt;/th&gt;&lt;td&gt;English&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;th&gt;Journal&lt;/th&gt;&lt;td&gt;&lt;a class="link" rel="Journal" href="http://somelink.html"&gt;&lt;span&gt;Journal of Human Rights and the Environment &lt;/span&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;th&gt;Journal publication date&lt;/th&gt;&lt;td&gt;2014&lt;/td&gt;&lt;/tr&gt;&lt;tr class="status"&gt;&lt;th&gt;State&lt;/th&gt;&lt;td&gt;In press&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;</description>
      <pubDate>Wed, 02 Apr 2014 15:59:41 GMT</pubDate>
      <guid>http://somelink.html</guid>
  <dc:date>2014-04-02T15:59:41Z</dc:date>
      </item>
  </channel>
</rss>

And show me how to use XSLT to parse the <description> tag to return the contents of the <span class="..."> fields or <div class='...'> fields?

I tried the following in my xslt:

<xsl:value-of select="span[@class='date']"/>

Which returns nothing

1条回答
Evening l夕情丶
2楼-- · 2019-08-18 08:14

Here's a clumsy way to extract the contents of the <span class="date"> element (or rather what would be the <span class="date"> element after disabling the escaping):

<xsl:value-of select="substring-before(substring-after(description, '&lt;span class=&quot;date&quot;&gt;'), '&lt;/span&gt;')"/>
查看更多
登录 后发表回答