Load gensim Word2Vec computed in Python 2, in Pyth

2019-04-14 15:53发布

I have a gensim Word2Vec model computed in Python 2 like that:

from gensim.models import Word2Vec
from gensim.models.word2vec import LineSentence

model = Word2Vec(LineSentence('enwiki.txt'), size=100, 
                 window=5, min_count=5, workers=15)
model.save('w2v.model')

However, I need to use it in Python 3. If I try to load it,

import gensim
from gensim.models import Word2Vec
model = Word2Vec.load('w2v.model')

it results in an error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xf9 in position 0: ordinal not in range(128)

I suppose the problem is in differences in encoding between Python2 and Python3. Also it seems like gensim is using pickle to save/load models.

Is there a way to set encoding/pickle options so that the model loads properly? Or maybe use some external tool to convert the model file?

Recomputing it in Python 3 is not an option: it takes way too much time.

1条回答
时光不老,我们不散
2楼-- · 2019-04-14 16:09

This indeed looks like a bug somewhere, as noted by memoselyk, and can be fixed in a way described in a comment to this answer.

So you have to add encoding='latin1' to a call to _pickle.loads in gensim.utils.unpickle, load the model in Python 3, then save it, and now you can revert this fix and load this new model in unmodified gensim with Python 3.

查看更多
登录 后发表回答