Extract Noun phrase using stanford NLP

I am trying to find the Theme/Noun phrase from a sentence using Stanford NLP

For eg: the sentence "the white tiger" I would love to get

Theme/Nound phrase as : white tiger.

For this I used pos tagger. My sample code is below.

Result I am getting is "tiger" which is not correct. Sample code I used to run is

public static void main(String[] args) throws IOException {
        Properties props = new Properties();
        props.setProperty("annotators", "tokenize,ssplit,parse");
        StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
        Annotation annotation = new Annotation("the white tiger)");
        pipeline.annotate(annotation);
        List<CoreMap> sentences = annotation
                .get(CoreAnnotations.SentencesAnnotation.class);
        System.out.println("the size of the senetence is......"
                + sentences.size());
        for (CoreMap sentence : sentences) {
            System.out.println("the senetence is..." + sentence.toString());
            Tree tree = sentence.get(TreeAnnotation.class);
            PrintWriter out = new PrintWriter(System.out);
            out.println("The first sentence parsed is:");
            tree.pennPrint(out);
            System.out.println("does it comes here.....1111");
            TregexPattern pattern = TregexPattern.compile("@NP");
            TregexMatcher matcher = pattern.matcher(tree);
            while (matcher.find()) {
                Tree match = matcher.getMatch();
                List<Tree> leaves1 = match.getChildrenAsList();
                StringBuilder stringbuilder = new StringBuilder();
                for (Tree tree1 : leaves1) {
                    String val = tree1.label().value();
                    if (val.equals("NN") || val.equals("NNS")
                            || val.equals("NNP") || val.equals("NNPS")) {
                        Tree nn[] = tree1.children();
                        String ss = Sentence.listToString(nn[0].yield());
                        stringbuilder.append(ss).append(" ");

                    }
                }
                System.out.println("the final stringbilder is ...."
                        + stringbuilder);
            }

        }

    }

Any help is really appreciated.Any other thoughts to get this achieved.

标签： nlp stanford-nlp sentiment-analysis pos-tagger

1条回答

我只想做你的唯一

2楼-- · 2019-06-06 22:46

It looks like you're descending the dependency trees looking for NN.*. "white" is a JJ--an adjective--which won't be included searching for NN.*.

You should take a close look at the Stanford Dependencies Manual and decide what part of speech tags encompass what you're looking for. You should also look at real linguistic data to try to figure out what matters in the task you're trying to complete. What about:

the tiger [with the black one] [who was white]

Simply traversing the tree in that case will give you tiger black white. Exclude PP's? Then you lose lots of good info:

the tiger [with white fur]

I'm not sure what you're trying to accomplish, but make sure what you're trying to do is restricted in the right way.

You ought to polish up on your basic syntax as well. "the white tiger" is what linguists call a Noun Phrase or NP. You'd be hard pressed for a linguist to call an NP a sentence. There are also often many NPs inside a sentence; sometimes, they're even embedded inside one another. The Stanford Dependencies Manual is a good start. As in the name, the Stanford Dependencies are based on the idea of dependency grammar, though there are other approaches that bring different insights to the table.

Learning what linguists know about the structure of sentences could help you significantly in getting at what you're trying to extract or--as happens often--realizing that what you're trying to extract is too difficult and that you need to find a new route to a solution.

0人赞添加讨论(0) 举报

Extract Noun phrase using stanford NLP

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间