Can anyone explain what causes this for better understanding of the environment?
emacs, unix
input:
with open("example.txt", "r") as f:
for files in f:
print files
split = files.split()
print split
output:
Hello world
['Hello', 'world']
Hello wörld
['Hello', 'w\xf6rld']
Python is printing the string representation, which includes a non-printable byte. Non-printable bytes (anything outside the ASCII range or a control character) is displayed as an escape sequence.
The point is that you can copy that representation and paste it into Python code or into the interpreter, producing the exact same value.
The \xf6
escape code represents a byte with hex value F6, which when interpreted as a Latin-1 byte value, is the ö
character.
You probably want to decode that value to Unicode to handle the data consistently. If you don't yet know what Unicode really is, or want to know anything else about encodings, see:
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky
The Python Unicode HOWTO
Pragmatic Unicode by Ned Batchelder
In python, lists are simply printed using unicode encoding. Basically printing a list makes the list calls __repr__
on each element (which results in a unicode print for strings). If you print each element by itself (in which case a strings __str__
method is used, rather than the list's) you get what you expect.
with open("example.txt", "r") as f:
for inp in f:
files = inp.decode('latin-1') // just to make sure this works on different systems
print files
split = files.split()
print split
print split[0]
print split[1]
Output:
hello world
[u'hello', u'world']
hello
world
hello wörld
[u'hello', u'w\xf6rld']
hello
wörld
python-mode.el
After adapting the print-forms for Python3
py-execute-buffer-python3
prints nicely:
Hello world
['Hello', 'world']
Hello wörld
['Hello', 'wörld']