可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I have done some research and seen solutions but none have worked for me.
Python - 'ascii' codec can't decode byte
This didn't work for me. And I know the 0xe9 is the é character. But I still can't figure out how to get this working, here is my code
output_lines = ['<menu>', '<day name="monday">', '<meal name="BREAKFAST">', '<counter name="Entreé">', '<dish>', '<name icon1="Vegan" icon2="Mindful Item">', 'Cream of Wheat (Farina)','</name>', '</dish>', '</counter >', '</meal >', '</day >', '</menu >']
output_string = '\n'.join([line.encode("utf-8") for line in output_lines])
And this give me the error ascii codec cant decode byte 0xe9
And I have tried decoding, I have tried to replace the "é" but can't seem to get that to work either.
回答1:
You are trying to encode bytestrings:
>>> '<counter name="Entreé">'.encode('utf8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 20: ordinal not in range(128)
Python is trying to be helpful, you can only encode a Unicode string to bytes, so to encode Python first implictly decodes, using the default encoding.
The solution is to not encode data that is already encoded, or first decode using a suitable codec before trying to encode again, if the data was encoded to a different codec than what you needed.
If you have a mix of unicode and bytestring values, decode just the bytestrings or encode just the unicode values; try to avoid mixing the types. The following decodes byte strings to unicode first:
def ensure_unicode(v):
if isinstance(v, str):
v = v.decode('utf8')
return unicode(v) # convert anything not a string to unicode too
output_string = u'\n'.join([ensure_unicode(line) for line in output_lines])
回答2:
A simple example of the problem is:
>>> '\xe9'.encode('utf-8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 0: ordinal not in range(128)
\xe9
isn't an ascii character which means that your string is already encoded. You need to decode it into python's unicode and then encode it again in the serialization format you want.
Since I don't know where your string came from, I just peeked at the python codecs, picked something from Western Europe and gave it a go:
>>> '\xe9'.decode('cp1252')
u'\xe9'
>>> u'\xe9'.encode('utf-8')
'\xc3\xa9'
>>>
You'll have the best luck if you know exactly which encoding the file came from.
回答3:
encode
= turn a unicode string into a bytestring
decode
= turn a bytestring into unicode
since you already have a bytestring you need decode to make it a unicode instance (assuming that is actually what you are trying to do)
output_string = '\n'.join(output_lines)
print output_string.decode("latin1") #now this returns unicode
回答4:
Based on what you want to do with your lines, you can do different work here, if you just want to print in consul as normally the consuls use utf8
encoding you dont need to do that by your self as the format of your string is not unicode
:
>>> output_string = '\n'.join(output_lines)
>>> print output_string
<menu>
<day name="monday">
<meal name="BREAKFAST">
<counter name="Entreé">
<dish>
<name icon1="Vegan" icon2="Mindful Item">
Cream of Wheat (Farina)
</name>
</dish>
</counter >
</meal >
</day >
</menu >
But if you want to write to file you can use codecs
module:
import codecs
f= codecs.open('out_file','w',encoding='utf8')