I have the following code in Python 3, which is meant to print out each line in a csv file.
import csv
with open('my_file.csv', 'r', newline='') as csvfile:
lines = csv.reader(csvfile, delimiter = ',', quotechar = '|')
for line in lines:
print(' '.join(line))
But when I run it, it gives me this error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 7386: invalid start byte
I looked through the csv file, and it turns out that if I take out a single ñ (little n with a tilde on top), every line prints out fine.
My problem is that I've looked through a bunch of different solutions to similar problems, but I still have no idea how to fix this, what to decode/encode, etc. Simply taking out the ñ character in the data is NOT an option.
For others who hit the same error shown in the subject, watch out for the file encoding of your csv file. Its possible it is not utf-8. I just noticed that LibreOffice created a utf-16 encoded file for me today without prompting me although I could not reproduce this.
If you try to open a utf-16 encoded document using
open(... encoding='utf-8')
, you will get the error:To fix either specify 'utf-16' encoding or change the encoding of the csv.
I also faced the issue with python 3 and my issue got resolved using the encoding type as utf-16
easy... just open it in excell or openoffice calc, use text as columns, select ",", and then just save the file as .csv... it takes me one day and several hour of seach in google... but at the end i figure it out.
We know the file contains the byte
b'\x96'
since it is mentioned in the error message:Now we can write a little script to find out if there are any encodings where
b'\x96'
decodes toñ
:which yields
Therefore, try changing
to one of those encodings, such as:
A much simpler solution is to open the csv file in notepad and select "Save As" in "File" dropdown list. Choose "Save as type" to "All files(.)". Select "UTF-8 Encoding" in Encoding dropdown list and put ".csv" extension to the file name
with open('my_file.csv', 'r', newline='', encoding='ISO-8859-1') as csvfile:
ñ character is not listed on UTC-8 encoding. To fix the issue, you may use ISO-8859-1 encoding instead. For more details about this encoding, you may refer to the link below: https://www.ic.unicamp.br/~stolfi/EXPORT/www/ISO-8859-1-Encoding.html