According to the documentation i can load a sense tagged corpus in nltk as such:
>>> from nltk.corpus import wordnet_ic
>>> brown_ic = wordnet_ic.ic('ic-brown.dat')
>>> semcor_ic = wordnet_ic.ic('ic-semcor.dat')
I can also get the definition
, pos
, offset
, examples
as such:
>>> wn.synset('dog.n.01').examples
>>> wn.synset('dog.n.01').definition
But how can get the frequency of a synset from a corpus? To break down the question:
- first how to count many times did a synset occurs a sense-tagged corpus?
- then the next step is to divide by the the count by the total number of counts for all synsets occurrences given the particular lemma.
I managed to do it this way.