gensim Word2vec transfer learning (from a non-gens

2020-04-28 08:30发布

问题:

I have a set of embeddings trained with a neural network that has nothing to do with gensim's word2vec.

I want to use these embeddings as the initial weights in gensim.Word2vec.

Now what I did see is that I can model.load(SOME_MODEL) and then continue training, but it requires a gensim modle as input. Also reset_from() seems to only accept other gensim model. But in my case, I don't have a gensim model to start from, but a text file in word2vec format of embeddings.

So how do I start transfer learning from an word2vec text file to gensim.Word2vec?

回答1:

You can load other models using the key vector format of the original Word2Vec model

import io

from gensim.models.keyedvectors import KeyedVectors

# first line is vocab size and vector dim
model_buf = io.StringIO("""
2 3
word0 -0.000737 -0.002106 0.001851
word1 -0.000878 -0.002106 0.002834
""".lstrip())

model = KeyedVectors.load_word2vec_format(model_buf, binary=False)

model['word0']