I'm trying to use nltk.tag.stanford module
for tagging a sentence (first like wiki's example) but i keep getting the following error :
Traceback (most recent call last):
File "test.py", line 28, in <module>
print st.tag(word_tokenize('What is the airspeed of an unladen swallow ?'))
File "/usr/local/lib/python2.7/dist-packages/nltk/tag/stanford.py", line 59, in tag
return self.tag_sents([tokens])[0]
File "/usr/local/lib/python2.7/dist-packages/nltk/tag/stanford.py", line 81, in tag_sents
stdout=PIPE, stderr=PIPE)
File "/usr/local/lib/python2.7/dist-packages/nltk/internals.py", line 160, in java
raise OSError('Java command failed!')
OSError: Java command failed!
or following LookupError
error :
LookupError:
===========================================================================
NLTK was unable to find the java file!
Use software specific configuration paramaters or set the JAVAHOME environment variable.
===========================================================================
this is the exapmle code :
>>> from nltk.tag.stanford import POSTagger
>>> st = POSTagger('/usr/share/stanford-postagger/models/english-bidirectional-distsim.tagger',
... '/usr/share/stanford-postagger/stanford-postagger.jar')
>>> st.tag('What is the airspeed of an unladen swallow ?'.split())
I also used word_tokenize
instead split
but it doesn't made any difference.
I also installed java again or jdk
! and my all searches were unsuccessful! something like nltknltk.internals.config_java()
or ... !
Note : I use linux (Xubuntu)!
If you read through the embedded documentation in the nltk/internals.py (lines 58 - 175) you should find your answer easy enough. The NLTK requires the full path to the Java binary.
You have a couple of options I believe based on a bit of research:
1) Add the following code to your project (not a great solution)
2) Uninstall & Reinstall NLTK (preferably in a virtualenv) (better but still not great)
3) Set the java environment variable (This is the most pragmatic solution IMO)
Edit the system Path file /etc/profile
Add following lines in end