NLTK MaltParser won't parse

2019-07-19 22:29发布

I am trying to use MaltParser from NLTK.

I could get to the point of configuring the parser:

import nltk
parser = nltk.parse.malt.MaltParser()
parser.config_malt()
parser.train_from_file('malt_train.conll')

but when it comes to actual parsing, parser returns an error:

File "<stdin>", line 1, in <module>
File "/Library/Python/2.7/site-packages/nltk/parse/malt.py", line 98, in raw_parse
return self.parse(words, verbose)
File "/Library/Python/2.7/site-packages/nltk/parse/malt.py", line 85, in parse
return self.tagged_parse(taggedwords, verbose)
File "/Library/Python/2.7/site-packages/nltk/parse/malt.py", line 139, in tagged_parse
return DependencyGraph.load(output_file)
File "/Library/Python/2.7/site-packages/nltk/parse/dependencygraph.py", line 121, in    load
return DependencyGraph(open(file).read())
IOError: [Errno 2] No such file or directory:'/var/folders/77/ch5yxf153jl67kmqr5jqywgr0000gn/T/malt_output.conll'

Here is the command that gives the error (from malt.py):

['java', '-jar /usr/lib/malt-1.6.1/malt.jar', '-w /var/folders/77/ch5yxf153jl67kmqr5jqywgr0000gn/T', '-c malt_temp', '-i /var/folders/77/ch5yxf153jl67kmqr5jqywgr0000gn/T/malt_input.conll', '-o /var/folders/77/ch5yxf153jl67kmqr5jqywgr0000gn/T/malt_output.conll', '-m parse']

I tried running jus the java command and here is what I get:

 The file entry 'malt_temp_singlemalt.info' in the mco file '/var/folders/77/ch5yxf153jl67kmqr5jqywgr0000gn/T/malt_temp.mco' cannot be loaded.  

Also tried the same with the pre-trained engmalt.poly.mco and engmalt.linear.mco

Any suggestions are very welcome.

EDIT : Here is the full function from malt.py

def tagged_parse(self, sentence, verbose=False):
    """
    Use MaltParser to parse a sentence. Takes a sentence as a list of
    (word, tag) tuples; the sentence must have already been tokenized and
    tagged.

    @param sentence: Input sentence to parse
    @type sentence: L{list} of (word, tag) L{tuple}s.
    @return: C{DependencyGraph} the dependency graph representation of the sentence
    """

    if not self._malt_bin:
        raise Exception("MaltParser location is not configured.  Call config_malt() first.")
    if not self._trained:
        raise Exception("Parser has not been trained.  Call train() first.")

    input_file = os.path.join(tempfile.gettempdir(), 'malt_input.conll')
    output_file = os.path.join(tempfile.gettempdir(), 'malt_output.conll')

    execute_string = 'java -jar %s -w %s -c %s -i %s -o %s -m parse'
    if not verbose:
        execute_string += ' > ' + os.path.join(tempfile.gettempdir(), "malt.out")

    f = None
    try:
        f = open(input_file, 'w')

        for (i, (word,tag)) in enumerate(sentence):
            f.write('%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\n' % 
                    (i+1, word, '_', tag, tag, '_', '0', 'a', '_', '_'))
        f.write('\n')
        f.close()

        cmd = ['java', '-jar %s' % self._malt_bin, '-w %s' % tempfile.gettempdir(), 
               '-c %s' % self.mco, '-i %s' % input_file, '-o %s' % output_file, '-m parse']
        print cmd

        self._execute(cmd, 'parse', verbose)

        return DependencyGraph.load(output_file)
    finally:
        if f: f.close()

1条回答
够拽才男人
2楼-- · 2019-07-19 23:13

Iam not sure if the Problem is still unsolved (but I think its already solved), but as I had the same problems a while ago, I would like to share my knowledge.

First of all, the MaltParser-Jar does not accept a .connl file with a direct path to its file in front of it. Like seen above. Why it is so... I do not know.

But you can easily fix it by changing the command line to something like this:

            cmd = ['java', '-jar %s' % self._malt_bin,'-w %s' %self.working_dir,'-c %s' % self.mco, '-i %s' % input_file, '-o %s' % output_file, '-m parse']

Here now the directory of the .conll file is set using the -w parameter. Using this you can load any .conll file from any given folder. I also change from tempfile.gettempdir() to self.working_dir, because in the "original" NLTK Version, always the /tmp/ folder is set as working directory. Even if you initialise the Maltparser with another working directory.

I hope this informations will help someone.

Another thing, if you want to parse many sentences as once, but each individually and not depending on all other sentences, you have to add a blank line in the input.conll file, and start the numeration for each sentence again with 1.

查看更多
登录 后发表回答