UnicodeDecodeError: 'charmap' codec can

2019-09-11 08:36发布

To CLARIFY: this question is not a duplicate of this one, I have already tried all the hints there and didn't get the answer.

I have a txt file with unicode data in, and am want to open the file as an string.

I tried

a=open('myfile.txt', 'r', encoding='utf-8') 
print a.read()

but there is an error saying:

UnicodeDecodeError: 'charmap' codec can't encode character '\ufeff' at position Y: character maps to undefined

Now my question is, I don't care about my UTF-8 characters at all, is there anyway to put an exception that whenever python is encountering utf-8 character just remove it or pass it? Also to clarify, I have tried the encoding with, utf-8, utf-8-sig, utf-16 and etc.

I tried this as well but no luck.

a=open('myfile.txt', 'r', encoding='utf-8') 
try:
    print a.read()
except:
    pass

I also tried importing codecs and the code below:

a=codecs.open('myfile.txt', 'r', encoding='utf-8') 
print a.read()

but still same error is popping out.

1条回答
三岁会撩人
2楼-- · 2019-09-11 08:44

Correcting my answer for encoding in print statement: Avoid printing to stdout Windows, because Python assumes that CMD terminal can only handle Windows-1252 (MS copy of ISO of latin-1). This is easily sidestepped by always printing to stderr instead:

import sys
print('your text', file=sys.stderr)

On Linux there should be no issue with printing Unicode correctly.

P.S.: for Python 2.x:

from __future__ import print_function
import sys
print('your text', file=sys.stderr)

P.P.S.: Original answer: For python 3.x:

a=open('myfile.txt', 'r', encoding='utf-8', errors='ignore') 

See https://docs.python.org/3/library/codecs.html#error-handlers for a detailed list of your options

查看更多
登录 后发表回答