I am working on a project involving anaphora resolution via Hobbs algorithm. I have parsed my text using the Stanford parser, and now I would like to manipulate the nodes in order to implement my algorithm.
At the moment, I don't understand how to:
Access a node based on its POS tag (e.g. I need to start with a pronoun - how do I get all pronouns?).
Use visitors. I'm a bit of a noob of Java, but in C++ I needed to implement a Visitor functor and then work on its hooks. I could not find much for the Stanford Parser's Tree structure though. Is that jgrapht? If it is, could you provide me with some pointers at code snippets?
@dhg's answer works fine, but here are two other options that it might also be useful to know about:
The Tree
class implements Iterable
. You can iterate through all the nodes of a Tree
, or, strictly, the subtrees headed by each node, in a pre-order traversal, with:
for (Tree subtree : t) {
if (subtree.label().value().equals("PRP")) {
pronouns.add(subtree);
}
}
You can also get just nodes that satisfy some (potentially quite complex pattern) by using tregex
, which behaves rather like java.util.regex
by allowing pattern matches over trees. You would have something like:
TregexPattern tgrepPattern = TregexPattern.compile("PRP");
TregexMatcher m = tgrepPattern.matcher(t);
while (m.find()) {
Tree subtree = m.getMatch();
pronouns.add(subtree);
}
Here's a simple example that parses a sentence and finds all of the pronouns.
private static ArrayList<Tree> findPro(Tree t) {
ArrayList<Tree> pronouns = new ArrayList<Tree>();
if (t.label().value().equals("PRP"))
pronouns.add(t);
else
for (Tree child : t.children())
pronouns.addAll(findPro(child));
return pronouns;
}
public static void main(String[] args) {
LexicalizedParser parser = LexicalizedParser.loadModel();
Tree x = parser.apply("The dog walks and he barks .");
System.out.println(x);
ArrayList<Tree> pronouns = findPro(x);
System.out.println("All Pronouns: " + pronouns);
}
This prints:
(ROOT (S (S (NP (DT The) (NN dog)) (VP (VBZ walks))) (CC and) (S (NP (PRP he)) (VP (VBZ barks))) (. .)))
All Pronouns: [(PRP he)]