Finding meaningful sub-sentences from a sentence

Is there a way to to find all the sub-sentences of a sentence that still are meaningful and contain at least one subject, verb, and a predicate/object?

For example, if we have a sentence like "I am going to do a seminar on NLP at SXSW in Austin next month". We can extract the following meaningful sub-sentences from this sentence: "I am going to do a seminar", "I am going to do a seminar on NLP", "I am going to do a seminar on NLP at SXSW", "I am going to do a seminar at SXSW", "I am going to do a seminar in Austin", "I am going to do a seminar on NLP next month", etc.

Please note that there is no deduced sentences here (e.g. "There will be a NLP seminar at SXSW next month". Although this is true, we don't need this as part of this problem.) . All generated sentences are strictly part of the given sentence.

How can we approach solving this problem? I was thinking of creating annotated training data that has a set of legal sub-sentences for each sentence in the training data set. And then write some supervised learning algorithm(s) to generate a model.

I am quite new to NLP and Machine Learning, so it would be great if you guys could suggest some ways to solve this problem.

标签： parsing artificial-intelligence nlp machine-learning grammar

4条回答

傲

2楼-- · 2019-01-31 14:56

One approach would be with a parser such as a PCFG. Trying to just train a model to detect 'subsentences' is likely to suffer from data sparsity. Also, I am doubtful that you could write down a really clean and unambiguous definition of a subsentence, and if you can't define it, you can't get annotators to annotate for it.

0人赞添加讨论(0) 举报

在下西门庆

3楼-- · 2019-01-31 15:00

You can use dependency parser provided by Stanford CoreNLP. Collapsed output of your sentence will look like below.

nsubj(going-3, I-1)
xsubj(do-5, I-1)
aux(going-3, am-2)
root(ROOT-0, going-3)
aux(do-5, to-4)
xcomp(going-3, do-5)
det(seminar-7, a-6)
dobj(do-5, seminar-7)
prep_on(seminar-7, NLP-9)
prep_at(do-5, -11)
prep_in(do-5, Austin-13)
amod(month-15, next-14)
tmod(do-5, month-15)

The last 5 of your sentence output are optional. You can remove one or more parts that are not essential to your sentence.
Most of this optional parts are belong to prepositional and modifier e.g : prep_in, prep_do, advmod, tmod, etc. See Stanford Dependency Manual.

For example, if you remove all modifier from the output, you will get

I am going to do a seminar on NLP at SXSW in Austin.

0人赞添加讨论(0) 举报

Fickle 薄情

4楼-- · 2019-01-31 15:09

There's a paper titled "Using Discourse Commitments to Recognize Textual Entailment" by Hickl et al that discusses the extraction of discourse commitments (sub-sentences). The paper includes a description of their algorithm which in some level operates on rules. They used it for RTE, and there may be some minimal levels of deduction in the output. Text simplification maybe a related area to look at.

0人赞添加讨论(0) 举报

再贱就再见

5楼-- · 2019-01-31 15:22

The following paper http://www.mpi-inf.mpg.de/~rgemulla/publications/delcorro13clausie.pdf processes the dependencies from the Stanford parser and contructs simple clauses (text-simplification).

See the online demo - https://d5gate.ag5.mpi-sb.mpg.de/ClausIEGate/ClausIEGate

0人赞添加讨论(0) 举报

Finding meaningful sub-sentences from a sentence

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间