Python lists with scandinavic letters

2019-01-18 15:32发布

问题:

Can anyone explain what causes this for better understanding of the environment?

emacs, unix

input:

with open("example.txt", "r") as f:
    for files in f:
        print files
        split = files.split()
        print split

output:

Hello world
['Hello', 'world']
Hello wörld
['Hello', 'w\xf6rld']

回答1:

Python is printing the string representation, which includes a non-printable byte. Non-printable bytes (anything outside the ASCII range or a control character) is displayed as an escape sequence.

The point is that you can copy that representation and paste it into Python code or into the interpreter, producing the exact same value.

The \xf6 escape code represents a byte with hex value F6, which when interpreted as a Latin-1 byte value, is the ö character.

You probably want to decode that value to Unicode to handle the data consistently. If you don't yet know what Unicode really is, or want to know anything else about encodings, see:

  • The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky

  • The Python Unicode HOWTO

  • Pragmatic Unicode by Ned Batchelder



回答2:

In python, lists are simply printed using unicode encoding. Basically printing a list makes the list calls __repr__ on each element (which results in a unicode print for strings). If you print each element by itself (in which case a strings __str__ method is used, rather than the list's) you get what you expect.

with open("example.txt", "r") as f:
    for inp in f:
        files = inp.decode('latin-1') // just to make sure this works on different systems
        print files
        split = files.split()
        print split
        print split[0]
        print split[1]

Output:

hello world

[u'hello', u'world']
hello
world
hello wörld
[u'hello', u'w\xf6rld']
hello
wörld


回答3:

python-mode.el

After adapting the print-forms for Python3

py-execute-buffer-python3

prints nicely:

Hello world

['Hello', 'world']

Hello wörld

['Hello', 'wörld']