Get certain nodes out of a Parse Tree

2019-04-01 18:41发布

I am working on a project involving anaphora resolution via Hobbs algorithm. I have parsed my text using the Stanford parser, and now I would like to manipulate the nodes in order to implement my algorithm.

At the moment, I don't understand how to:

  • Access a node based on its POS tag (e.g. I need to start with a pronoun - how do I get all pronouns?).

  • Use visitors. I'm a bit of a noob of Java, but in C++ I needed to implement a Visitor functor and then work on its hooks. I could not find much for the Stanford Parser's Tree structure though. Is that jgrapht? If it is, could you provide me with some pointers at code snippets?

2条回答
神经病院院长
2楼-- · 2019-04-01 18:43

Here's a simple example that parses a sentence and finds all of the pronouns.

private static ArrayList<Tree> findPro(Tree t) {
    ArrayList<Tree> pronouns = new ArrayList<Tree>();
    if (t.label().value().equals("PRP"))
        pronouns.add(t);
    else
        for (Tree child : t.children())
            pronouns.addAll(findPro(child));
    return pronouns;
}

public static void main(String[] args) {

    LexicalizedParser parser = LexicalizedParser.loadModel();
    Tree x = parser.apply("The dog walks and he barks .");
    System.out.println(x);
    ArrayList<Tree> pronouns = findPro(x);
    System.out.println("All Pronouns: " + pronouns);

}

This prints:

    (ROOT (S (S (NP (DT The) (NN dog)) (VP (VBZ walks))) (CC and) (S (NP (PRP he)) (VP (VBZ barks))) (. .)))
    All Pronouns: [(PRP he)]
查看更多
疯言疯语
3楼-- · 2019-04-01 19:03

@dhg's answer works fine, but here are two other options that it might also be useful to know about:

  • The Tree class implements Iterable. You can iterate through all the nodes of a Tree, or, strictly, the subtrees headed by each node, in a pre-order traversal, with:

    for (Tree subtree : t) { 
        if (subtree.label().value().equals("PRP")) {
            pronouns.add(subtree);
        }
    }
    
  • You can also get just nodes that satisfy some (potentially quite complex pattern) by using tregex, which behaves rather like java.util.regex by allowing pattern matches over trees. You would have something like:

    TregexPattern tgrepPattern = TregexPattern.compile("PRP");
    TregexMatcher m = tgrepPattern.matcher(t);
    while (m.find()) {
        Tree subtree = m.getMatch();
        pronouns.add(subtree);
    }
    
查看更多
登录 后发表回答