Dependency parser using NLTK and MaltParser

2019-04-08 08:22发布

问题:

I'm using NLTK and Maltparser to extract dependencies from sentences in natural language. I did some experiments using Stanford parser with this code:

sentence =  '''I shot an elephant in my pajamas'''
os.popen("echo '"+sentence+"' > ~/stanfordtemp.txt")
parser_out = os.popen("/usr/local/Cellar/stanford-parser/2.0.3/bin/lexparser.sh ~/stanfordtemp.txt").readlines()

for i, tag in enumerate(parser_out):
    if len(tag.strip()) > 0 and tag.strip()[0] == '(':
        parse = " ".join(tag.strip())
        print i, "Parse: ", tag
    elif len(tag.strip()) > 0:
        print i, "Typed dependencies: ", tag 
bracketed_parse = " ".join( [tag.strip() for tag in parser_out if len(tag.strip()) > 0 and tag.strip()[0] == "("] )
print bracketed_parse

and had this nice result:

Parsing [sent. 1 len. 7]: I shot an elephant in my pajamas

Parsed 7 words in 1 sentences (12,87 wds/sec; 1,84 sents/sec).
0 Parse:  (ROOT
1 Parse:    (S
2 Parse:      (NP (PRP I))
3 Parse:      (VP (VBD shot)
4 Parse:        (NP (DT an) (NN elephant))
5 Parse:        (PP (IN in)
6 Parse:          (NP (PRP$ my) (NNS pajamas))))))
8 Typed dependencies:  nsubj(shot-2, I-1)
9 Typed dependencies:  root(ROOT-0, shot-2)
10 Typed dependencies:  det(elephant-4, an-3)
11 Typed dependencies:  dobj(shot-2, elephant-4)
12 Typed dependencies:  poss(pajamas-7, my-6)
13 Typed dependencies:  prep_in(shot-2, pajamas-7)

With the MaltParser I have this code:

os.environ['MALTPARSERHOME']="/Applications/maltparser-1.7.2"
maltParser = nltk.parse.malt.MaltParser(working_dir="/Applications/maltparser-1.7.2", 
                                        mco="engmalt.linear-1.7",
                                        additional_java_args=['-Xmx1024m'])
txt = '''I shot an elephant in my pajamas'''
graph = maltParser.raw_parse(txt)
print(graph.tree().pprint())

and the follow output:

(pajamas (shot I) an elephant in my)

Question: Can I have the same output as when I'm using Stanford parser? Any help would be great.

回答1:

Poking around in the MALT documentation I don't see an option that will 100% match the detailed Stanford Parser output that you show but you could experiment with the connlx and connlu output options to see if they have the information that you need.

http://www.maltparser.org/options.html

In principle, a non-projective dependency analysis can be reformulated as a constituency analysis by a transformation of the MALT output which would give you the bracketing for not too much effort but labeling the consituents would be a lot more work.