python can encode to utf-8 but can't decode

The Code Below Can Encode A String To Utf-8 :

#!/usr/bin/python
# -*- coding: utf-8 -*-

str = 'ورود'
print(str.encode('utf-8'))

That Prints:

b'\xd9\x88\xd8\xb1\xd9\x88\xd8\xaf'

But I can't Decode This String With This Code :

#!/usr/bin/python
# -*- coding: utf-8 -*-

str = b'\xd9\x88\xd8\xb1\xd9\x88\xd8\xaf'
print(str.decode('utf-8'))

The error is:

Traceback (most recent call last):
  File "C:\test.py", line 5, in <module>
    print(str.decode('utf-8'))
AttributeError: 'str' object has no attribute 'decode'

Please Help Me ...

Edit

From the answers switched to a byte string:

#!/usr/bin/python
# -*- coding: utf-8 -*-

str = b'\xd9\x88\xd8\xb1\xd9\x88\xd8\xaf'
print(str.decode('utf-8'))

Now the error is:

Traceback (most recent call last):
  File "C:\test.py", line 5, in <module>
    print(str.decode('utf-8'))
  File "C:\Python34\lib\encodings\cp437.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-3: character maps to <undefined>

标签： python python-3.x utf-8 decode

3条回答

霸刀☆藐视天下

2楼-- · 2019-04-02 01:32

It looks like you're using Python 3.X. You .encode() Unicode strings (u'xxx' or 'xxx'). You .decode() byte strings b'xxxx'.

#!/usr/bin/python
# -*- coding: utf-8 -*-

s = b'\xd9\x88\xd8\xb1\xd9\x88\xd8\xaf'
#   ^
#   Need a 'b'
#
print(s.decode('utf-8'))

Note your terminal may not be able to display the Unicode string. Mine Windows console doesn't:

Python 3.3.5 (v3.3.5:62cf4e77f785, Mar  9 2014, 10:35:05) [MSC v.1600 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> s = b'\xd9\x88\xd8\xb1\xd9\x88\xd8\xaf'
>>> #   ^
... #   Need a 'b'
... #
... print(s.decode('utf-8'))
Traceback (most recent call last):
  File "<stdin>", line 4, in <module>
  File "D:\dev\Python33x64\lib\encodings\cp437.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-3: character maps to <undefined>

But it does do the decode. '\uxxxx' represents a Unicode code point.

>>> s.decode('utf-8')
'\u0648\u0631\u0648\u062f'

My PythonWin IDE supports UTF-8 and can display the characters:

>>> s = b'\xd9\x88\xd8\xb1\xd9\x88\xd8\xaf'
>>> print(s.decode('utf-8'))
ورود

You can also write the data to a file and display it in an editor that supports UTF-8, like Notepad. since your original string is already UTF-8, just write it to a file directly as bytes. 'wb' opens the file in binary mode and the bytes are written as is:

>>> with open('out.txt','wb') as f:
...     f.write(s)

If you have a Unicode string, you can write it as UTF-8 with:

>>> with open('out.txt','w',encoding='utf8') as f:
...     f.write(u)  # assuming "u" is already a decoded Unicode string.

P.S. str is a built-in type. Don't use it for variable names.

Python 2.x works differently. 'xxxx' is a byte string and u'xxxx' is a Unicode string, but you still .encode() the Unicode string and .decode() the byte string.

0人赞添加讨论(0) 举报

迷人小祖宗

3楼-- · 2019-04-02 01:38

Use following code:

str = b'\xd9\x88\xd8\xb1\xd9\x88\xd8\xaf'
print(str.decode('utf-8'))

0人赞添加讨论(0) 举报

不美不萌又怎样

4楼-- · 2019-04-02 01:44

Python has a first class unicode type that you can use in place of the plain bytestring str type. It’s easy, once you accept the need to explicitly convert between a bytestring and a Unicode string:

>>> persian_enter = unicode('\xd9\x88\xd8\xb1\xd9\x88\xd8\xaf', 'utf8')
>>> print persian_enter
ورود

Python 2 had two global functions to coerce objects into strings: unicode() to coerce them into Unicode strings, and str() to coerce them into non-Unicode strings. Python 3 has only one string type, Unicode strings, so the str() function is all you need. (The unicode() function no longer exists.)

read more about reading and writing unicode data

0人赞添加讨论(0) 举报

python can encode to utf-8 but can't decode

Edit

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间