Python and UTF-8: kind of confusing

2019-05-28 01:03发布

问题:

I am on google app engine with Python 2.5. My application have to deal with multilanguages so I have to deal with utf-8.

I have done lots of google but dont get what I want.

1.Whats the usage of # -*- coding: utf-8 -*- ?

2.What is the difference between

s=u'Witaj świecie'
s='Witaj świecie'

'Witaj świecie' is a utf-8 string.

3.When I save the .py file to 'utf-8', do I still need the u before every string?

回答1:

u'blah' turns it into a different kind of string (type unicode rather than type str) - it makes it a sequence of unicode codepoints. Without it, it is a sequence of bytes. Only bytes can be written to disk or to a network stream, but you generally want to work in Unicode (although Python, and some libraries, will do some of the conversion for you) - the encoding (utf-8) is the translation between these. So, yes, you should use the u in front of all your literals, it will make your life much easier. See Programatic Unicode for a better explanation.

The coding line tells Python what encoding your file is in, so that Python can understand it. Again, reading from disk gives bytes - but Python wants to see the characters. In Py2, the default encoding for code is ASCII, so the coding line lets you put things like ś directly in your .py file in the first place - other than that, it doesn't change how your code works.