Difference between python 2 and 3 for utf-8

2019-07-27 07:19发布

Why is the output different for the two commands below?

$ python2.7 -c 'print("\303\251")' 
é   # <-- Great

$ python3.6 -c 'print("\303\251")'
é  # <-- WTF?!

What would be the python3 command to output "é" from "\303\251"?

Best regards,

Olivier

标签: python utf-8
2条回答
Juvenile、少年°
2楼-- · 2019-07-27 07:56

On Python 2, you are telling Python to print two bytes. It prints two bytes. Your terminal interprets those two bytes as an encoding of é and displays é. (It looks like your terminal is using UTF8.)

On Python 3, you are telling Python to print the two characters with Unicode code points 0o303 and 0o251 (in octal). Those characters are é. Python encodes those characters in a system-dependent encoding (probably UTF8) and writes the resulting bytes to stdout. Your terminal then decodes the bytes and displays é.

If you want Python 3 to print é, give it the Unicode code point (\u00e9), or just tell it to print é:

$ python3.6 -c 'print("é")'
é
查看更多
祖国的老花朵
3楼-- · 2019-07-27 08:00

As explained in the first answer by user2357112, this line tells Python 3 to print two characters indicated by their octal value (an octal byte indicates the unicode code point of the character):

$ python3.6 -c 'print("\303\251")'
é

The following line can be used for a behavior similar to Python 2:

$ python3.6 -c 'print(b"\303\251".decode())'
é
查看更多
登录 后发表回答