error UnicodeDecodeError: 'utf-8' codec ca

2020-01-24 03:44发布

站内文章 / Python

174 0

闹够了就滚

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

https://github.com/affinelayer/pix2pix-tensorflow/tree/master/tools

An error occurred when compiling "process.py" on the above site.

 python tools/process.py --input_dir data --            operation resize --outp
ut_dir data2/resize
data/0.jpg -> data2/resize/0.png

Traceback (most recent call last):

File "tools/process.py", line 235, in <module>
  main()
File "tools/process.py", line 167, in main
  src = load(src_path)
File "tools/process.py", line 113, in load
  contents = open(path).read()
      File"/home/user/anaconda3/envs/tensorflow_2/lib/python3.5/codecs.py", line 321, in decode
  (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode     byte 0xff in position 0: invalid start byte

What is the cause of the error? Python's version is 3.5.2.

回答1:

Python tries to convert a byte-array (a bytes which it assumes to be a utf-8-encoded string) to a unicode string (str). This process of course is a decoding according to utf-8 rules. When it tries this, it encounters a byte sequence which is not allowed in utf-8-encoded strings (namely this 0xff at position 0).

Since you did not provide any code we could look at, we only could guess on the rest.

From the stack trace we can assume that the triggering action was the reading from a file (contents = open(path).read()). I propose to recode this in a fashion like this:

with open(path, 'rb') as f:
  contents = f.read()

That b in the mode specifier in the open() states that the file shall be treated as binary, so contents will remain a bytes. No decoding attempt will happen this way.

回答2:

Use this solution it will strip out (ignore) the characters and return the string without them. Only use this if your need is to strip them not convert them.

with open(path, encoding="utf8", errors='ignore') as f:

Using errors='ignore' You'll just lose some characters. but if your don't care about them as they seem to be extra characters originating from a the bad formatting and programming of the clients connecting to my socket server. Then its a easy direct solution. reference

回答3:

Had an issue similar to this, Ended up using UTF-16 to decode. my code is below.

with open(path_to_file,'rb') as f:
    contents = f.read()
contents = contents.rstrip("\n").decode("utf-16")
contents = contents.split("\r\n")

this would take the file contents as an import, but it would return the code in UTF format. from there it would be decoded and seperated by lines.

回答4:

I've come across this thread when suffering the same error, after doing some research I can confirm, this is an error that happens when you try to decode a UTF-16 file with UTF-8.

With UTF-16 the first characther (2 bytes in UTF-16) is a Byte Order Mark (BOM), which is used as a decoding hint and doesn't appear as a character in the decoded string. This means the first byte will be either FE or FF and the second, the other.

Heavily edited after I found out the real answer

回答5:

Use encoding format ISO-8859-1 to solve the issue.

回答6:

use only

base64.b64decode(a)

instead of

base64.b64decode(a).decode('utf-8')

回答7:

If you are on a mac check if you for a hidden file, .DS_Store. After removing the file my program worked.

回答8:

Check the path of the file to be read. My code kept on giving me errors until I changed the path name to present working directory. The error was:

newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

回答9:

It simply means that one chose the wrong encoding to read the file.

On Mac, use file -I file.txt to find the correct encoding. On Linux, use file -i file.txt.

回答10:

if you are receiving data from a serial port, make sure you are using the right baudrate (and the other configs ) : decoding using (utf-8) but the wrong config will generate the same error

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

to check your serial port config on linux use : stty -F /dev/ttyUSBX -a

回答11:

I had a similar problem.

Solved it by:

import io

with io.open(filename, 'r', encoding='utf-8') as fn:
  lines = fn.readlines()

However, I had another problem. Some html files (in my case) were not utf-8, so I received a similar error. When I excluded those html files, everything worked smoothly.

So, except from fixing the code, check also the files you are reading from, maybe there is an incompatibility there indeed.

回答12:

If possible, open the file in a text editor and try to change the encoding to UTF-8. Otherwise do it programatically at the OS level.

回答13:

I have a similar problem. I try to run an example in tensorflow/models/objective_detection and met the same message. Try to change Python3 to Python2

回答14:

HitHere, you should load the "GoogleNews-vectors-negative300.bin.gz" file at first then extract it by this command in Ubuntu: gunzip -k GoogleNews-vectors-negative300.bin.gz. [ manually extracting is never recommended]. secondly, you should apply these commands in pyrhon 3:

import gensim model = gensim.models.Word2Vec.load_word2vec_format('./model/GoogleNews-vectors-negative300.bin', binary=True) . I hope it will be useful.