I trained Gensim W2V model on 500K sentences (around 60K) words and I want to calculate the perplexity.
- What will be the best way to do so?
- for 60K words, how can I check what will be a proper amount of data?
Thanks
I trained Gensim W2V model on 500K sentences (around 60K) words and I want to calculate the perplexity.
Thanks
If you want to calculate the perplexity, you have first to retrieve the loss. On the
gensim.models.word2vec.Word2Vec
constructor, pass thecompute_loss=True
parameter - this way,gensim
will store the loss for you while training. Once trained, you can call theget_latest_training_loss()
method to retrieve the loss.Since the loss in the cross-entropy loss of the skip-gram model, 2 to the power of the loss will give you the preplexity. (2**loss)