I am trying to make a word frequency counter in XSLT. I want it to use stop words. I got started with Michael Kay's book. But I have trouble getting the stop words to work.
This code will work on any source XML file.
<?xml version="1.0" encoding="iso-8859-1"?>
<xsl:stylesheet
version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="/">
<xsl:variable name="stopwords" select="'a about an are as at be by for from how I in is it of on or that the this to was what when where who will with'"/>
<wordcount>
<xsl:for-each-group group-by="." select="
for $w in //text()/tokenize(., '\W+')[not(.=$stopwords)] return $w">
<word word="{current-grouping-key()}" frequency="{count(current-group())}"/>
</xsl:for-each-group>
</wordcount>
</xsl:template>
</xsl:stylesheet>
I think the not(.=$stopwords)
is where my problem is. But I'm not sure what to do about it.
Also I'll take hints on how to load the stop words from a external file.
Your $stopwords variable is now a single string; you want it to be a sequence of strings. You can do this in any of the following ways:
Change its declaration to
Change its declaration to
Read it from an external XML document named (e.g.) stoplist.xml, of the form
and then load it, e.g. with
You are comparing the current word with the entire list of all stop words, instead you should check if the current word is contained in the list of stop words:
The concatenation of a space is needed to avoid partial matches - e.g. prevent 'abo' to match 'about'.