-->

Where to find an exhaustive list of stop words?

2019-07-03 23:45发布

问题:

Where could I find an exhaustive list of stop words? The one I have is quite short and it seems to be inapplicable to scientific texts. I am creating lexical chains to extract key topics from scientific papers. The problem is that words like based, regarding, etc. should also be considered as stop words as they do not deliver much sense.

回答1:

You can also easily add to existing stop word lists. E.g. use the one in the NLTK toolkit:

from nltk.corpus import stopwords

and then add whatever you think is missing:

stopwords = stopwords.words('english')+["based", "regarding"]

The original NLTK list is described here.



回答2:

It is difficult to find an exhaustive list of stop words because a given word could be considered as a stop word in a given domain but it is an important word in another domain.

you could take a look at some lists of stop words:

http://blog.adlegant.com/how-to-install-nltk-corporastopwords/

http://www.lextek.com/manuals/onix/stopwords1.html

http://www.ranks.nl/stopwords

http://xpo6.com/list-of-english-stop-words/