I\'m trying to get a Python 3 program to do some manipulations with a text file filled with information. However, when trying to read the file I get the following error:
Traceback (most recent call last):
File \"SCRIPT LOCATION\", line NUMBER, in
text = file.read()
File \"C:\\Python31\\lib\\encodings\\cp1252.py\", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: \'charmap\' codec can\'t decode byte 0x90 in position 2907500: character maps to
If anyone could give me any help to try and get past this problem I would be most grateful.
The file in question is not using the CP1252
encoding. It\'s using another encoding. Which one you have to figure out yourself. Common ones are Latin-1
and UTF-8
. Since 0x90 doesn\'t actually mean anything in Latin-1
, UTF-8
(where 0x90 is a continuation byte) is more likely.
You specify the encoding when you open the file:
file = open(filename, encoding=\"utf8\")
As an extension to @LennartRegebro answer:
If you can\'t tell what encoding it is and solution above does not work (it\'s not utf8
) and you found yourself merely guessing - there are online tools that you could use to identify what encoding that is. They aren\'t perfect but usually work just fine. After you figured out encoding you should be able to use solution above.
EDIT: (Copied from comment)
A quite popular text editor Sublime Text
has a command to display encoding if it has been set...
- Go to
View
-> Show Console
(or Ctrl+`)
- Type into field at the bottom
view.encoding()
and hope for the best (I was unable to get anything but Undefined
but maybe you will have better luck...)
Just to add in case file = open(filename, encoding=\"utf8\")
does not work
try file = open(filename, errors=\'ignore\')
All IS WELL