Do you know some paper or algorithm in NLP that is able to extract sentences from text that are related to given entity (term). I would like to process some reviews (mainly tech), but I found out that many reviews mention more then one product (they do comparation). I would like to extract from that text just sentences that are relevant to one product, or delete sentences that are irrelevant to particular named entity (product).
My questin is how to do it? Is there some related papers? Is something like this done by some toolkit or api?
What you want is a Named Entity Recognizer (NER). Given an input sentence, the NER will identify the various entities in the sentence as persons, organizations, products etc. You can then check entities recognized as products, and keep or discard the sentence accordingly. One very simple possibility would be to use the named entity recognizer of NLTK in Python. Here is an example:
The output will be:
NLTK works well for this simple example, but to be honest I'm not sure how accurate it is or if it can be customized to fit your purposes (identifying products). But I know that the Stanford NER is both customizable and accurate, so you might want to have a look at the above link.