Processing input before giving input to parser

What kind of processing should be done to the input which is given to the parser.

As of know i am using stanford parser.jar but there is also stanford coreNLP.jar what is the difference between parser.jar and coreNLP.jar parsing method

As per coreNLP documentation you can pass the operation you want to do as input in the annotators

COMMAND:

java -cp "*" -Xmx2g edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma,ner,parse,dcoref -file input.txt

To use parsing in coreNLP can i pass only parse or should I pass all the annotators except dcoref

i.e.)

java -cp "*" -Xmx2g edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,parse -file input.txt
                                      or
java -cp "*" -Xmx2g edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma,ner,parse,dcoref -file input.txt

Does the parser.jar has sentence splitting in built in it's jar

Can I give paragraph as input and get sentence and their parsed data as out

or should i give only one sentence at a time
Thank you,

标签： parsing stanford-nlp

1条回答

女痞

2楼-- · 2019-08-01 12:08

The CoreNLP annotators can be thought of as a dependency graph. The parser annotator depends on tokenization (tokenize) and sentence splitting (ssplit) only. So, you could run the parser with your first command:

java -cp "*" -Xmx2g edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,parse -file input.txt

If you know your text is pre-tokenized, the easiest thing to do is to set the options tokenize.whitespace = "true" in your properties file (or pass it in as a flag: -tokenize.whitespace). To only sentence split at the end of a line, you can set the option (ssplit.eolonly).

But, by default, yes CoreNLP will tokenize and split up your sentence for you. You can just feed in a pile of text, and it will output parsed sentences.

0人赞添加讨论(0) 举报

Processing input before giving input to parser

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间