To CLARIFY: this question is not a duplicate of this one, I have already tried all the hints there and didn't get the answer.
I have a txt file with unicode data in, and am want to open the file as an string.
I tried
a=open('myfile.txt', 'r', encoding='utf-8')
print a.read()
but there is an error saying:
UnicodeDecodeError: 'charmap' codec can't encode character '\ufeff' at position Y: character maps to undefined
Now my question is, I don't care about my UTF-8 characters at all, is there anyway to put an exception that whenever python is encountering utf-8 character just remove it or pass it? Also to clarify, I have tried the encoding with, utf-8, utf-8-sig, utf-16 and etc.
I tried this as well but no luck.
a=open('myfile.txt', 'r', encoding='utf-8')
try:
print a.read()
except:
pass
I also tried importing codecs and the code below:
a=codecs.open('myfile.txt', 'r', encoding='utf-8')
print a.read()
but still same error is popping out.
Correcting my answer for encoding in
print
statement: Avoid printing tostdout
Windows, because Python assumes that CMD terminal can only handle Windows-1252 (MS copy of ISO of latin-1). This is easily sidestepped by always printing tostderr
instead:On Linux there should be no issue with printing Unicode correctly.
P.S.: for Python 2.x:
P.P.S.: Original answer: For python 3.x:
See https://docs.python.org/3/library/codecs.html#error-handlers for a detailed list of your options