Stanford NLP Tagger via NLTK - tag_sents splits ev

2019-03-02 17:22发布

I'm hoping someone has experience with this as I'm unable to find any comments online besides a bug report from 2015 regarding the NERtagger which is probably the same.

Anyway, I'm trying to batch process text to get around the poor performing base tagger. From what I understand, tag_sents should help.

from nltk.tag.stanford import StanfordPOSTagger
from nltk import word_tokenize
import nltk

stanford_model = 'stanford-postagger/models/english-bidirectional-distsim.tagger'
stanford_jar = 'stanford-postagger/stanford-postagger.jar'
tagger = StanfordPOSTagger(stanford_model, stanford_jar)
tagger.java_options = '-mx4096m'
text = "The quick brown fox jumps over the lazy dog."
print tagger.tag_sents(text)

Except no matter what I pass to the tag_sents method, the text gets split up into chars instead of words. Anyone know why it doesn't work properly? This works as expected...

tag(text)

I tried splitting the sentence into tokens as well to see if that helped but same treatment

标签： python nlp nltk stanford-nlp

2条回答

▲ chillily

2楼-- · 2019-03-02 17:55

The tag_sents function takes a list of list of strings.

tagger.tag_sents(word_tokenize("The quick brown fox jumps over the lazy dog."))

Here's a useful idiom:

 tagger.tag_sents(word_tokenize(sent) for sent in sent_tokenize(text))

where text is a string.

0人赞添加讨论(0) 举报

你好瞎i

3楼-- · 2019-03-02 18:04

Another variation of what alvas said, which worked for me: tagger.tag_sents([[text]]).

0人赞添加讨论(0) 举报

Stanford NLP Tagger via NLTK - tag_sents splits ev

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间