Is there any way to use the Standford Tagger in a more performant fashion?
Each call to NLTK's wrapper starts a new java instance per analyzed string which is very very slow especially when a larger foreign language model is used...
http://www.nltk.org/api/nltk.tag.html#module-nltk.tag.stanford
Using
nltk.tag.stanford.POSTagger.tag_sents()
for tagging multiple sentences.The
tag_sents
has replaced the oldbatch_tag
function, see https://github.com/nltk/nltk/blob/develop/nltk/tag/stanford.py#L61DEPRECATED:
Tag the sentences using
batch_tag
instead oftag
, see http://www.nltk.org/_modules/nltk/tag/stanford.html#StanfordTagger.batch_tagFound the solution. It is possible to run the POS Tagger in servlet mode and then connect to it via HTTP. Perfect.
http://nlp.stanford.edu/software/pos-tagger-faq.shtml#d
example
start server in background
adjust firewall to limit access to port 2020 from localhost only
test it with wget
shutdown server
restore iptable settings