Python converting latin1 to UTF8

In Python 2.7, how do you convert a latin1 string to UTF-8.

For example, I'm trying to convert é to utf-8.

>>> "é"
'\xe9'
>>> u"é"
u'\xe9'
>>> u"é".encode('utf-8')
'\xc3\xa9'
>>> print u"é".encode('utf-8')
Ã©

The letter is é which is LATIN SMALL LETTER E WITH ACUTE (U+00E9) The UTF-8 byte encoding for is: c3a9
The latin byte encoding is: e9

How do I get the UTF-8 encoded version of a latin string? Could someone give an example of how to convert the é?

标签： python encoding utf-8 python-2.7 latin1

3条回答

Anthone

2楼-- · 2019-04-06 10:17

concept = concept.encode('ascii', 'ignore') concept = MySQLdb.escape_string(concept.decode('latin1').encode('utf8').rstrip())

I do this, I am not sure if that is a good approach but it works everytime !!

0人赞添加讨论(0) 举报

The star\"

3楼-- · 2019-04-06 10:23

>>> u"é".encode('utf-8')
'\xc3\xa9'

You've got a UTF-8 encoded byte sequence. Don't try to print encoded bytes directly. To print them you need to decode the encoded bytes back into a Unicode string.

>>> u"é".encode('utf-8').decode('utf-8')
u'\xe9'
>>> print u"é".encode('utf-8').decode('utf-8')
é

Notice that encoding and decoding are opposite operations which effectively cancel out. You end up with the original u"é" string back, although Python prints it as the equivalent u'\xe9'.

>>> u"é" == u'\xe9'
True

0人赞添加讨论(0) 举报

再贱就再见

4楼-- · 2019-04-06 10:26

To decode a byte sequence from latin 1 to Unicode, use the .decode() method:

>>> '\xe9'.decode('latin1')
u'\xe9'

Python uses \xab escapes for unicode codepoints below \u00ff.

>>> '\xe9'.decode('latin1') == u'\u00e9'
True

The above Latin-1 character can be encoded to UTF-8 as:

>>> '\xe9'.decode('latin1').encode('utf8')
'\xc3\xa9'

0人赞添加讨论(0) 举报

Python converting latin1 to UTF8

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间