Wordnet (Word Sense Annotated) Corpus

2019-03-30 21:38发布

站内文章 / 前沿技术

28 0

做个烂人

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I've been utilizing lots of different corpora for natural language processing, and I've been looking for a corpus that has been annotated with Wordnet Word Senses.

I understand that there probably is not a big corpus with this information, since the corpus needs to be built up manually, but there has to be something to go off of.

Also if there isn't a corpus in existence, is there at least a sense annotated ngram database (with what percentage of the time a word is each of its definitions, or a numerical count of each wordnet definition depending on how common the word sense is)?

回答1:

Three prominent corpora annotated for WordNet:

MASC
WordNet gloss
SemCor

回答2:

Some of the SENSEVAL (now SEMEVAL) data is annotated with WordNet.

回答3:

you can use senseval2, for java there is a semcor format and (jSemcor API) and also senseval3. these two corpus are used for Word sense disambiguation.