Failed to load a .bin.gz pre trained words2vecx

2019-08-26 23:52发布

问题:

I'm trying to load the pre-trained words2vecs which I've found here (https://github.com/mmihaltz/word2vec-GoogleNews-vectors) I used the following command:

model = gensim.models.KeyedVectors.load_word2vec_format('word2vec.bin.gz', binary=False)

And it throws this error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/deeplearning/anaconda3/lib/python3.6/site-
packages/gensim/models/keyedvectors.py", line 193, in 
 load_word2vec_format
    header = utils.to_unicode(fin.readline(), encoding=encoding)
  File "/home/deeplearning/anaconda3/lib/python3.6/gzip.py", line 374, 
in readline
    return self._buffer.readline(size)
  File "/home/deeplearning/anaconda3/lib/python3.6/_compression.py", 
line 68, in readinto
    data = self.read(len(byte_view))
  File "/home/deeplearning/anaconda3/lib/python3.6/gzip.py", line 463, 
in read
    if not self._read_gzip_header():
  File "/home/deeplearning/anaconda3/lib/python3.6/gzip.py", line 411, 
in _read_gzip_header
    raise OSError('Not a gzipped file (%r)' % magic)
OSError: Not a gzipped file (b've')

回答1:

Because the error says, "Not a gzipped file", maybe the file has been inadvertently uncompressed, but still has the misleading .gzip extension? (Try renaming without .gzip, an loading that file.)

Because the filename includes .bin, it is likely a 'binary' style word2vec format. So the optional parameter may need to be binary=True.

The filename you're using, word2vec.bin.gz, does not match the filename at the link you provided, GoogleNews-vectors-negative300.bin.gz. This suggests other changes may have occurred that could cause problems.

The error also reports a 'magic number' (indicative prefix) from the file, b've', that looks like a bit of plain text, rather than the beginning of a real gzip file. You might want to look at the first few lines of the problem file, via something like head word2vec.bin.gz, to see if there are other indications of what it is (other than what you expect it to be).