Python的转换LATIN1到UTF8(Python converting latin1 to U

在Python 2.7，你怎么一个latin1的字符串转换为UTF-8。

例如，我试图é转换为UTF-8。

>>> "é"
'\xe9'
>>> u"é"
u'\xe9'
>>> u"é".encode('utf-8')
'\xc3\xa9'
>>> print u"é".encode('utf-8')
Ã©

字母为E，其与急性（U + 00E9）的UTF-8字节编码拉丁小写字母E表示为：c3a9
拉丁字节编码是：E9

如何获得拉丁字符串的UTF-8编码的版本？可能有人给如何给E转换的例子吗？

Answer 1:

解码来自拉丁1为Unicode的字节序列，使用.decode()方法：

>>> '\xe9'.decode('latin1')
u'\xe9'

Python使用\xab下面的Unicode代码点逃逸\u00ff 。

>>> '\xe9'.decode('latin1') == u'\u00e9'
True

上述Latin-1的字符可以被编码为UTF-8为：

>>> '\xe9'.decode('latin1').encode('utf8')
'\xc3\xa9'

>>> u"é".encode('utf-8')
'\xc3\xa9'

你已经有了一个UTF-8编码的字节序列。不要试图直接打印编码的字节。要打印都需要编码的字节回解码成Unicode字符串。

>>> u"é".encode('utf-8').decode('utf-8')
u'\xe9'
>>> print u"é".encode('utf-8').decode('utf-8')
é

请注意，编码和解码是其有效地抵消相反的操作。你最终会与原来的u"é"字符串返回，虽然Python的打印它作为相当于u'\xe9' 。

>>> u"é" == u'\xe9'
True

概念= concept.encode（ 'ASCII'， '忽略'）的概念= MySQLdb.escape_string（concept.decode（ 'latin1的'）。编码（ 'UTF8'）。rstrip（）可以）

我这样做，我不知道这是一个很好的方法，但它的工作原理每次！

文章来源: Python converting latin1 to UTF8