Difference between python 2 and 3 for utf-8

2019-07-27 07:34发布

问题:

Why is the output different for the two commands below?

$ python2.7 -c 'print("\303\251")' 
é   # <-- Great

$ python3.6 -c 'print("\303\251")'
é  # <-- WTF?!

What would be the python3 command to output "é" from "\303\251"?

Best regards,

Olivier

回答1:

On Python 2, you are telling Python to print two bytes. It prints two bytes. Your terminal interprets those two bytes as an encoding of é and displays é. (It looks like your terminal is using UTF8.)

On Python 3, you are telling Python to print the two characters with Unicode code points 0o303 and 0o251 (in octal). Those characters are é. Python encodes those characters in a system-dependent encoding (probably UTF8) and writes the resulting bytes to stdout. Your terminal then decodes the bytes and displays é.

If you want Python 3 to print é, give it the Unicode code point (\u00e9), or just tell it to print é:

$ python3.6 -c 'print("é")'
é


回答2:

As explained in the first answer by user2357112, this line tells Python 3 to print two characters indicated by their octal value (an octal byte indicates the unicode code point of the character):

$ python3.6 -c 'print("\303\251")'
é

The following line can be used for a behavior similar to Python 2:

$ python3.6 -c 'print(b"\303\251".decode())'
é


标签: python utf-8