'utf-8' codec can't decode byte 0x80

2020-03-13 05:10发布

I'm trying to download BVLC-trained model and I'm stuck with this error

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 110: invalid start byte

I think it's because of the following function (complete code)

  # Closure-d function for checking SHA1.
  def model_checks_out(filename=model_filename, sha1=frontmatter['sha1']):
      with open(filename, 'r') as f:
          return hashlib.sha1(f.read()).hexdigest() == sha1

Any idea how to fix this?

3条回答
仙女界的扛把子
2楼-- · 2020-03-13 05:36

Since there is not a single hint in the documentation nor src code, I have no clue why, but using the b char (i guess for binary) totally works (tf-version: 1.1.0):

image_data = tf.gfile.FastGFile(filename, 'rb').read()

For more information, check out: gfile

查看更多
等我变得足够好
3楼-- · 2020-03-13 05:55

You are opening a file that is not UTF-8 encoded, while the default encoding for your system is set to UTF-8.

Since you are calculating a SHA1 hash, you should read the data as binary instead. The hashlib functions require you pass in bytes:

with open(filename, 'rb') as f:
    return hashlib.sha1(f.read()).hexdigest() == sha1

Note the addition of b in the file mode.

See the open() documentation:

mode is an optional string that specifies the mode in which the file is opened. It defaults to 'r' which means open for reading in text mode. [...] In text mode, if encoding is not specified the encoding used is platform dependent: locale.getpreferredencoding(False) is called to get the current locale encoding. (For reading and writing raw bytes use binary mode and leave encoding unspecified.)

and from the hashlib module documentation:

You can now feed this object with bytes-like objects (normally bytes) using the update() method.

查看更多
我命由我不由天
4楼-- · 2020-03-13 05:58

You didn't specify to open the file in binary mode, so f.read() is trying to read the file as a UTF-8-encoded text file, which doesn't seem to be working. But since we take the hash of bytes, not of strings, it doesn't matter what the encoding is, or even whether the file is text at all: just open it, and then read it, as a binary file.

>>> with open("test.h5.bz2","r") as f: print(hashlib.sha1(f.read()).hexdigest())
Traceback (most recent call last):
  File "<ipython-input-3-fdba09d5390b>", line 1, in <module>
    with open("test.h5.bz2","r") as f: print(hashlib.sha1(f.read()).hexdigest())
  File "/home/dsm/sys/pys/Python-3.5.1-bin/lib/python3.5/codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb8 in position 10: invalid start byte

but

>>> with open("test.h5.bz2","rb") as f: print(hashlib.sha1(f.read()).hexdigest())
21bd89480061c80f347e34594e71c6943ca11325
查看更多
登录 后发表回答