The Code Below Can Encode A String To Utf-8 :
#!/usr/bin/python
# -*- coding: utf-8 -*-
str = 'ورود'
print(str.encode('utf-8'))
That Prints:
b'\xd9\x88\xd8\xb1\xd9\x88\xd8\xaf'
But I can't Decode This String With This Code :
#!/usr/bin/python
# -*- coding: utf-8 -*-
str = b'\xd9\x88\xd8\xb1\xd9\x88\xd8\xaf'
print(str.decode('utf-8'))
The error is:
Traceback (most recent call last):
File "C:\test.py", line 5, in <module>
print(str.decode('utf-8'))
AttributeError: 'str' object has no attribute 'decode'
Please Help Me ...
Edit
From the answers switched to a byte string:
#!/usr/bin/python
# -*- coding: utf-8 -*-
str = b'\xd9\x88\xd8\xb1\xd9\x88\xd8\xaf'
print(str.decode('utf-8'))
Now the error is:
Traceback (most recent call last):
File "C:\test.py", line 5, in <module>
print(str.decode('utf-8'))
File "C:\Python34\lib\encodings\cp437.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-3: character maps to <undefined>
It looks like you're using Python 3.X. You
.encode()
Unicode strings (u'xxx'
or'xxx'
). You.decode()
byte stringsb'xxxx'
.Note your terminal may not be able to display the Unicode string. Mine Windows console doesn't:
But it does do the decode.
'\uxxxx'
represents a Unicode code point.My PythonWin IDE supports UTF-8 and can display the characters:
You can also write the data to a file and display it in an editor that supports UTF-8, like Notepad. since your original string is already UTF-8, just write it to a file directly as bytes.
'wb'
opens the file in binary mode and the bytes are written as is:If you have a Unicode string, you can write it as UTF-8 with:
P.S.
str
is a built-in type. Don't use it for variable names.Python 2.x works differently.
'xxxx'
is a byte string andu'xxxx'
is a Unicode string, but you still.encode()
the Unicode string and.decode()
the byte string.Use following code:
Python has a first class unicode type that you can use in place of the plain bytestring str type. It’s easy, once you accept the need to explicitly convert between a bytestring and a Unicode string:
Python 2 had two global functions to coerce objects into strings: unicode() to coerce them into Unicode strings, and str() to coerce them into non-Unicode strings. Python 3 has only one string type, Unicode strings, so the str() function is all you need. (The unicode() function no longer exists.)
read more about reading and writing unicode data