I am using gensim word2vec package in python. I know how to get the vocabulary from the trained model. But how to get the word count for each word in vocabulary?
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
回答1:
Each word in the vocabulary has an associated vocabulary object, which contains an index and a count.
vocab_obj = w2v.vocab["word"]
vocab_obj.count
Output for google news w2v model: 2998437
So to get the count for each word, you would iterate over all words and vocab objects in the vocabulary.
for word, vocab_obj in w2v.vocab.items():
#Do something with vocab_obj.count
回答2:
When you want to create a dictionary of word to count for easy retrieval later, you can do so as follows:
w2c = dict()
for item in model.wv.vocab:
w2c[item]=model.wv.vocab[item].count
If you want to sort it to see the most frequent words in the model, you can also do that so:
w2cSorted=dict(sorted(w2c.items(), key=lambda x: x[1],reverse=True))