Load gensim Word2Vec computed in Python 2, in Pyth

2019-04-14 15:53发布

I have a gensim Word2Vec model computed in Python 2 like that:

from gensim.models import Word2Vec
from gensim.models.word2vec import LineSentence

model = Word2Vec(LineSentence('enwiki.txt'), size=100, 
                 window=5, min_count=5, workers=15)
model.save('w2v.model')

However, I need to use it in Python 3. If I try to load it,

import gensim
from gensim.models import Word2Vec
model = Word2Vec.load('w2v.model')

it results in an error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xf9 in position 0: ordinal not in range(128)

I suppose the problem is in differences in encoding between Python2 and Python3. Also it seems like gensim is using pickle to save/load models.

Is there a way to set encoding/pickle options so that the model loads properly? Or maybe use some external tool to convert the model file?

Recomputing it in Python 3 is not an option: it takes way too much time.

标签： python python-3.x encoding gensim word2vec

1条回答

时光不老，我们不散

2楼-- · 2019-04-14 16:09

This indeed looks like a bug somewhere, as noted by memoselyk, and can be fixed in a way described in a comment to this answer.

So you have to add encoding='latin1' to a call to _pickle.loads in gensim.utils.unpickle, load the model in Python 3, then save it, and now you can revert this fix and load this new model in unmodified gensim with Python 3.

0人赞添加讨论(0) 举报

Load gensim Word2Vec computed in Python 2, in Pyth

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间