How to improve speed with Stanford NLP Tagger and

2019-01-13 19:03发布

Is there any way to use the Standford Tagger in a more performant fashion?

Each call to NLTK's wrapper starts a new java instance per analyzed string which is very very slow especially when a larger foreign language model is used...

http://www.nltk.org/api/nltk.tag.html#module-nltk.tag.stanford

标签： python nltk stanford-nlp

2条回答

欢心

2楼-- · 2019-01-13 19:50

Using nltk.tag.stanford.POSTagger.tag_sents() for tagging multiple sentences.

The tag_sents has replaced the old batch_tag function, see https://github.com/nltk/nltk/blob/develop/nltk/tag/stanford.py#L61

DEPRECATED:

Tag the sentences using batch_tag instead of tag, see http://www.nltk.org/_modules/nltk/tag/stanford.html#StanfordTagger.batch_tag

0人赞添加讨论(0) 举报

再贱就再见

3楼-- · 2019-01-13 19:54

Found the solution. It is possible to run the POS Tagger in servlet mode and then connect to it via HTTP. Perfect.

http://nlp.stanford.edu/software/pos-tagger-faq.shtml#d

example

start server in background

nohup java -mx1000m -cp /var/stanford-postagger-full-2014-01-04/stanford-postagger.jar edu.stanford.nlp.tagger.maxent.MaxentTaggerServer -model /var/stanford-postagger-full-2014-01-04/models/german-dewac.tagger -port 2020 >& /dev/null &

adjust firewall to limit access to port 2020 from localhost only

iptables -A INPUT -p tcp -s localhost --dport 2020 -j ACCEPT
iptables -A INPUT -p tcp --dport 2020 -j DROP

test it with wget

wget http://localhost:2020/?die welt ist schön

shutdown server

pkill -f stanford

restore iptable settings

iptables -D INPUT -p tcp -s localhost --dport 2020 -j ACCEPT
iptables -D INPUT -p tcp --dport 2020 -j DROP

0人赞添加讨论(0) 举报

How to improve speed with Stanford NLP Tagger and

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间