I am trying to use MaltParser from NLTK.
I could get to the point of configuring the parser:
import nltk
parser = nltk.parse.malt.MaltParser()
parser.config_malt()
parser.train_from_file('malt_train.conll')
but when it comes to actual parsing, parser returns an error:
File "<stdin>", line 1, in <module>
File "/Library/Python/2.7/site-packages/nltk/parse/malt.py", line 98, in raw_parse
return self.parse(words, verbose)
File "/Library/Python/2.7/site-packages/nltk/parse/malt.py", line 85, in parse
return self.tagged_parse(taggedwords, verbose)
File "/Library/Python/2.7/site-packages/nltk/parse/malt.py", line 139, in tagged_parse
return DependencyGraph.load(output_file)
File "/Library/Python/2.7/site-packages/nltk/parse/dependencygraph.py", line 121, in load
return DependencyGraph(open(file).read())
IOError: [Errno 2] No such file or directory:'/var/folders/77/ch5yxf153jl67kmqr5jqywgr0000gn/T/malt_output.conll'
Here is the command that gives the error (from malt.py):
['java', '-jar /usr/lib/malt-1.6.1/malt.jar', '-w /var/folders/77/ch5yxf153jl67kmqr5jqywgr0000gn/T', '-c malt_temp', '-i /var/folders/77/ch5yxf153jl67kmqr5jqywgr0000gn/T/malt_input.conll', '-o /var/folders/77/ch5yxf153jl67kmqr5jqywgr0000gn/T/malt_output.conll', '-m parse']
I tried running jus the java command and here is what I get:
The file entry 'malt_temp_singlemalt.info' in the mco file '/var/folders/77/ch5yxf153jl67kmqr5jqywgr0000gn/T/malt_temp.mco' cannot be loaded.
Also tried the same with the pre-trained engmalt.poly.mco and engmalt.linear.mco
Any suggestions are very welcome.
EDIT : Here is the full function from malt.py
def tagged_parse(self, sentence, verbose=False):
"""
Use MaltParser to parse a sentence. Takes a sentence as a list of
(word, tag) tuples; the sentence must have already been tokenized and
tagged.
@param sentence: Input sentence to parse
@type sentence: L{list} of (word, tag) L{tuple}s.
@return: C{DependencyGraph} the dependency graph representation of the sentence
"""
if not self._malt_bin:
raise Exception("MaltParser location is not configured. Call config_malt() first.")
if not self._trained:
raise Exception("Parser has not been trained. Call train() first.")
input_file = os.path.join(tempfile.gettempdir(), 'malt_input.conll')
output_file = os.path.join(tempfile.gettempdir(), 'malt_output.conll')
execute_string = 'java -jar %s -w %s -c %s -i %s -o %s -m parse'
if not verbose:
execute_string += ' > ' + os.path.join(tempfile.gettempdir(), "malt.out")
f = None
try:
f = open(input_file, 'w')
for (i, (word,tag)) in enumerate(sentence):
f.write('%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\n' %
(i+1, word, '_', tag, tag, '_', '0', 'a', '_', '_'))
f.write('\n')
f.close()
cmd = ['java', '-jar %s' % self._malt_bin, '-w %s' % tempfile.gettempdir(),
'-c %s' % self.mco, '-i %s' % input_file, '-o %s' % output_file, '-m parse']
print cmd
self._execute(cmd, 'parse', verbose)
return DependencyGraph.load(output_file)
finally:
if f: f.close()
Iam not sure if the Problem is still unsolved (but I think its already solved), but as I had the same problems a while ago, I would like to share my knowledge.
First of all, the MaltParser-Jar does not accept a .connl file with a direct path to its file in front of it. Like seen above. Why it is so... I do not know.
But you can easily fix it by changing the command line to something like this:
Here now the directory of the .conll file is set using the -w parameter. Using this you can load any .conll file from any given folder. I also change from
tempfile.gettempdir()
toself.working_dir
, because in the "original" NLTK Version, always the /tmp/ folder is set as working directory. Even if you initialise the Maltparser with another working directory.I hope this informations will help someone.
Another thing, if you want to parse many sentences as once, but each individually and not depending on all other sentences, you have to add a blank line in the input.conll file, and start the numeration for each sentence again with 1.