How to extract the noun phrases using Open nlp'

2019-01-14 17:25发布

I am newbie to Natural Language processing.I need to extract the noun phrases from the text.So far i have used open nlp's chunking parser for parsing my text to get the Tree structure.But i am not able to extract the noun phrases from the tree structure, is there any regular expression pattern in open nlp so that i can use it to extract the noun phrases.

Below is the code that i am using

    InputStream is = new FileInputStream("en-parser-chunking.bin");
    ParserModel model = new ParserModel(is);
    Parser parser = ParserFactory.create(model);
    Parse topParses[] = ParserTool.parseLine(line, parser, 1);
        for (Parse p : topParses){
                 p.show();}

Here I am getting the output as

(TOP (S (S (ADJP (JJ welcome) (PP (TO to) (NP (NNP Big) (NNP Data.))))) (S (NP (PRP We)) (VP (VP (VBP are) (VP (VBG working) (PP (IN on) (NP (NNP Natural) (NNP Language) (NNP Processing.can))))) (NP (DT some) (CD one) (NN help)) (NP (PRP us)) (PP (IN in) (S (VP (VBG extracting) (NP (DT the) (NN noun) (NNS phrases)) (PP (IN from) (NP (DT the) (NN tree) (WP stucture.))))))))))

Can some one please help me in getting the noun phrases like NP,NNP,NN etc.Can some one tell me do I need to use any other NP Chunker to get the noun phrases?Is there any regex pattern to achieve the same.

Please help me on this.

Thanks in advance

Gouse.

3条回答
一纸荒年 Trace。
2楼-- · 2019-01-14 17:54

The Parse object is a tree; you can use getParent() and getChildren() and getType() to navigate the tree.

List<Parse> nounPhrases;

public void getNounPhrases(Parse p) {
    if (p.getType().equals("NP")) {
         nounPhrases.add(p);
    }
    for (Parse child : p.getChildren()) {
         getNounPhrases(child);
    }
}
查看更多
Animai°情兽
3楼-- · 2019-01-14 18:05

if you only want noun phrases, then use the sentence chunker rather than the tree parser. the code is something like this (you need to get the model from the same place you got the parser model)

public void chunk() {
    InputStream modelIn = null;
    ChunkerModel model = null;

    try {
      modelIn = new FileInputStream("en-chunker.bin");
      model = new ChunkerModel(modelIn);
    }
    catch (IOException e) {
      // Model loading failed, handle the error
      e.printStackTrace();
    }
    finally {
      if (modelIn != null) {
        try {
          modelIn.close();
        }
        catch (IOException e) {
        }
      }
    }

//After the model is loaded a Chunker can be instantiated.


    ChunkerME chunker = new ChunkerME(model);



    String sent[] = new String[]{"Rockwell", "International", "Corp.", "'s",
      "Tulsa", "unit", "said", "it", "signed", "a", "tentative", "agreement",
      "extending", "its", "contract", "with", "Boeing", "Co.", "to",
      "provide", "structural", "parts", "for", "Boeing", "'s", "747",
      "jetliners", "."};

    String pos[] = new String[]{"NNP", "NNP", "NNP", "POS", "NNP", "NN",
      "VBD", "PRP", "VBD", "DT", "JJ", "NN", "VBG", "PRP$", "NN", "IN",
      "NNP", "NNP", "TO", "VB", "JJ", "NNS", "IN", "NNP", "POS", "CD", "NNS",
      "."};

    String tag[] = chunker.chunk(sent, pos);
  }

then look at the tag array for the types you want

http://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#tools.parser.chunking.api
查看更多
▲ chillily
4楼-- · 2019-01-14 18:08

Will continue from your code itself . This program block will provide all the noun phrases in sentence. Use getTagNodes() method to get Tokens and its types

Parse topParses[] = ParserTool.parseLine(line, parser, 1);
Parse words[]=null; //an array to store the tokens
//Loop thorugh to get the tag nodes
for (Parse nodes : topParses){
        words=nodes.getTagNodes(); // we will get a list of nodes
}

for(Parse word:words){
//Change the types according to your desired types
    if(word.getType().equals("NN") || word.getType().equals("NNP") || word.getType().equals("NNS")){
            System.out.println(word);
            }
        }
查看更多
登录 后发表回答