pyspark : how to configure StopWordsRemover with f

2019-08-26 05:36发布

I would like to know how to configure stopwordsremover with french language in spark 1.6.3.

I'm currently using pyspark.

Thanks for your help.

Best regards,

2条回答
神经病院院长
2楼-- · 2019-08-26 06:01

Based on Python Spark 1.6.3 docs, pyspark.ml.feature.StopWordsRemover does not have a language parameter. However you can always provide your own list of stopwords via the "stopWords" parameter.

查看更多
我只想做你的唯一
3楼-- · 2019-08-26 06:02

Take a look at the nltk package

I use it for portuguese words:

from pyspark.ml.feature import StopWordsRemover
import nltk
nltk.download("stopwords")

...

stopwordList = nltk.corpus.stopwords.words('portuguese')
remover = StopWordsRemover(inputCol=tokenizer.getOutputCol(), outputCol="stopWordsRem", stopWords=stopwordList)

Hope it helps

查看更多
登录 后发表回答