I am trying to find the Theme/Noun phrase from a sentence using Stanford NLP
For eg: the sentence "the white tiger" I would love to get
Theme/Nound phrase as : white tiger.
For this I used pos tagger. My sample code is below.
Result I am getting is "tiger" which is not correct. Sample code I used to run is
public static void main(String[] args) throws IOException {
Properties props = new Properties();
props.setProperty("annotators", "tokenize,ssplit,parse");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
Annotation annotation = new Annotation("the white tiger)");
pipeline.annotate(annotation);
List<CoreMap> sentences = annotation
.get(CoreAnnotations.SentencesAnnotation.class);
System.out.println("the size of the senetence is......"
+ sentences.size());
for (CoreMap sentence : sentences) {
System.out.println("the senetence is..." + sentence.toString());
Tree tree = sentence.get(TreeAnnotation.class);
PrintWriter out = new PrintWriter(System.out);
out.println("The first sentence parsed is:");
tree.pennPrint(out);
System.out.println("does it comes here.....1111");
TregexPattern pattern = TregexPattern.compile("@NP");
TregexMatcher matcher = pattern.matcher(tree);
while (matcher.find()) {
Tree match = matcher.getMatch();
List<Tree> leaves1 = match.getChildrenAsList();
StringBuilder stringbuilder = new StringBuilder();
for (Tree tree1 : leaves1) {
String val = tree1.label().value();
if (val.equals("NN") || val.equals("NNS")
|| val.equals("NNP") || val.equals("NNPS")) {
Tree nn[] = tree1.children();
String ss = Sentence.listToString(nn[0].yield());
stringbuilder.append(ss).append(" ");
}
}
System.out.println("the final stringbilder is ...."
+ stringbuilder);
}
}
}
Any help is really appreciated.Any other thoughts to get this achieved.
It looks like you're descending the dependency trees looking for
NN.*
. "white" is aJJ
--an adjective--which won't be included searching forNN.*
.You should take a close look at the Stanford Dependencies Manual and decide what part of speech tags encompass what you're looking for. You should also look at real linguistic data to try to figure out what matters in the task you're trying to complete. What about:
Simply traversing the tree in that case will give you
tiger black white
. ExcludePP
's? Then you lose lots of good info:I'm not sure what you're trying to accomplish, but make sure what you're trying to do is restricted in the right way.
You ought to polish up on your basic syntax as well. "the white tiger" is what linguists call a Noun Phrase or
NP
. You'd be hard pressed for a linguist to call anNP
a sentence. There are also often manyNP
s inside a sentence; sometimes, they're even embedded inside one another. The Stanford Dependencies Manual is a good start. As in the name, the Stanford Dependencies are based on the idea of dependency grammar, though there are other approaches that bring different insights to the table.Learning what linguists know about the structure of sentences could help you significantly in getting at what you're trying to extract or--as happens often--realizing that what you're trying to extract is too difficult and that you need to find a new route to a solution.