Strategies for recognizing proper nouns in NLP

2019-01-16 21:18发布

I'm interested in learning more about Natural Language Processing (NLP) and am curious if there are currently any strategies for recognizing proper nouns in a text that aren't based on dictionary recognition? Also, could anyone explain or link to resources that explain the current dictionary-based methods? Who are the authoritative experts on NLP or what are the definitive resources on the subject?

8条回答
放我归山
2楼-- · 2019-01-16 21:30

It depends on what you mean by dictionary-based.

For example, one strategy would be to take things that aren't in a dictionary and try to proceed on the assumption that they're proper nouns. If this leads to a sensible parse, consider the assumption provisionally validated and keep going, otherwise conclude that they aren't.

Other ideas:

  • In subject position, any simple subject without a determiner is a good candidate.
  • Ditto in prepositional phrases
  • In any position, the basis of a possessive determiner (e.g. Bob in "Bob's sister") is a good candidate

-- MarkusQ

查看更多
兄弟一词,经得起流年.
3楼-- · 2019-01-16 21:36

some toolkits suggested: 1. Opennlp: there is a Named Entity Recognition component for your task 2. LingPipe: also a NER component for it 3. Stanford NLP package: excellent package for academic usage, maybe not commercial friendly. 4. nltk: a Python NLP package

查看更多
Luminary・发光体
4楼-- · 2019-01-16 21:46

Though this is for Bengali language, but it can draw a common procedure identified proper noun. So I hope this will be helpful for you. Please check the folowing link: http://www.mecs-press.org/ijmecs/ijmecs-v6-n8/v6n8-1.html

查看更多
Animai°情兽
5楼-- · 2019-01-16 21:48

Besides the dictionary-based approach, two others come to my mind:

  • Pattern-based approaches (in a simple form: anything that is capitalized is a proper noun)
  • Machine learning approaches (mark proper nouns in a training corpus and train a classifier)

The field is mostly called named-entity extraction and often considered a subfield of information extraction. A good starting point for the different fields of NLP is usually the according chapter in the Oxford Handbook of Computational Linguistics:

Oxford Handbook of Computational Linguistics http://ukcatalogue.oup.com/images/en_US/covers/medium/9780198238829_140.jpg

查看更多
Viruses.
6楼-- · 2019-01-16 21:50

If you're interested in the implementation of natural language processing and python is your programming language, then this can be a very informative resource: http://www.youtube.com/watch?v=kKe4M4iSclc

查看更多
疯言疯语
7楼-- · 2019-01-16 21:51

Try searching for "named entity recognition"--that's the term that's used in the NLP literature for this sort of thing.

查看更多
登录 后发表回答