In Python 2.7, how do you convert a latin1 string to UTF-8.
For example, I'm trying to convert é to utf-8.
>>> "é"
'\xe9'
>>> u"é"
u'\xe9'
>>> u"é".encode('utf-8')
'\xc3\xa9'
>>> print u"é".encode('utf-8')
é
The letter is é which is LATIN SMALL LETTER E WITH ACUTE (U+00E9)
The UTF-8 byte encoding for is: c3a9
The latin byte encoding is: e9
How do I get the UTF-8 encoded version of a latin string? Could someone give an example of how to convert the é?
I do this, I am not sure if that is a good approach but it works everytime !!
You've got a UTF-8 encoded byte sequence. Don't try to print encoded bytes directly. To print them you need to decode the encoded bytes back into a Unicode string.
Notice that encoding and decoding are opposite operations which effectively cancel out. You end up with the original
u"é"
string back, although Python prints it as the equivalentu'\xe9'
.To decode a byte sequence from latin 1 to Unicode, use the
.decode()
method:Python uses
\xab
escapes for unicode codepoints below\u00ff
.The above Latin-1 character can be encoded to UTF-8 as: