Extract relevant sentences to entity

2019-04-17 15:49发布

Do you know some paper or algorithm in NLP that is able to extract sentences from text that are related to given entity (term). I would like to process some reviews (mainly tech), but I found out that many reviews mention more then one product (they do comparation). I would like to extract from that text just sentences that are relevant to one product, or delete sentences that are irrelevant to particular named entity (product).

My questin is how to do it? Is there some related papers? Is something like this done by some toolkit or api?

1条回答
够拽才男人
2楼-- · 2019-04-17 16:23

What you want is a Named Entity Recognizer (NER). Given an input sentence, the NER will identify the various entities in the sentence as persons, organizations, products etc. You can then check entities recognized as products, and keep or discard the sentence accordingly. One very simple possibility would be to use the named entity recognizer of NLTK in Python. Here is an example:

import nltk
sent = "Albert Einstein spent many years at Princeton University in New Jersey"
sent1 = nltk.word_tokenize(sent)
sent2 = nltk.pos_tag(sent1)
sent3 = nltk.ne_chunk(sent2)
print sent3

The output will be:

(S
  (PERSON Albert/NNP)
  (PERSON Einstein/NNP)
  spent/VBD
  many/JJ 
  years/NNS
  at/IN
  (ORGANIZATION Princeton/NNP University/NNP)
  in/IN
  (GPE New/NNP Jersey/NNP))

NLTK works well for this simple example, but to be honest I'm not sure how accurate it is or if it can be customized to fit your purposes (identifying products). But I know that the Stanford NER is both customizable and accurate, so you might want to have a look at the above link.

查看更多
登录 后发表回答