I simplified my code for better understanding. here is the problem :
case 1:
# -*- coding: utf-8 -*-
text = "چرا کار نمیکنی؟" # also using u"...." results the same
print(text)
output:
UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-2: character maps to <undefined>
case 2:
text = "چرا کار نمیکنی؟".encode("utf-8")
print(text)
there is no output.
case 3:
import sys
text = "چرا کار نمیکنی؟".encode("utf-8")
sys.stdout.buffer.write(text)
output:
چرا کار نمیکنی؟
I know that case 3 works somehow , but I want to use other functions like print() , write(str()) , ....
I also read the documentation of python 3 regarding to Unicode here.
and also read dozens of Q&A in stackoverflow.
and here is a long article explaining the problem and answer for python 2.X
the simple question is:
how to print non-ASCII characters like Farsi or Arabic using python print() function?
update 1 : as it is suggested from many guys that the problem is concerned with the terminal I tested the case :
case 4 :
text = "چرا کار نمیکنی؟" .encode("utf-8")# also using u"...." results the same
print(text)
terminal :
python persian_encoding.py > test.txt
test.txt :
b'\xda\x86\xd8\xb1\xd8\xa7 \xda\xa9\xd8\xa7\xd8\xb1 \xd9\x86\xd9\x85\xdb\x8c\xda\xa9\xd9\x86\xdb\x8c\xd8\x9f'
very important update:
after a while playing around with this issue, finally I found another workaround to make cmd.exe do the job (without needing third party softwares like ConEmu or ...):
a little explanation first:
our main problem does not concern Python. it's a problem with the Command Prompt character set in Windows(for complete explanation check out Arman's Answer) so ... if you change the character set of Windows Command Prompt to UTF-8 instead of default ascii , then the Command Prompt will be able to interact with UTF-8 characters(like Farsi or Arabic) this solution does not guarantee good representation of characters(as they will be printed out like little squares), but it's a good solution if you want to have file I/O in python with UTF-8 characters.
Steps:
before starting python from command line , type:
chcp 65001
now run your python code as always.
python testcode.py
result in case 1:
?????? ??? ??????
it runs without errors.
screenshot:
for more information about how to set 65001 as the default character set check this out.
I can't reproduce the problem. Here is my script
p.py
:And the result of
python3 p.py
:Are you sure you're using python 3 ? With
python2 p.py
:Your code is correct as it works on my computer with both Python 2 and 3 (I'm on OS X):
The problem is with your terminal that can not output unicode characters. You could verify it by redirecting your output to a file like
python3 my_file.py > test.txt
and open the file using an editor.If you are on Windows you could use a terminal like Console2 or ConEmu that renders unicode better than Windows prompt.
You may encounter errors with these terminals too because of wrong code-pages/encodings of Windows. There is a small python package that fixes them (sets them correctly):
1- Install this
pip install win-unicode-console
2- Put this at the top of your python file:
If you got errors when redirecting to a file, you may fix it by settings io encoding:
On Windows command line:
On Linux/OS X terminal:
Some points
u"aaa"
syntax in python 3. Strings literals are unicode by default.# -*- coding: utf-8 -*-
) is not needed.And if you do the
text.encode("utf-8")
-part, it will show asb'\xda\x86\xd8\xb1\xd8\xa7 \xda\xa9\xd8\xa7\xd8\xb1 \xd9\x86\xd9\x85\xdb\x8c\xda\xa9\xd9\x86\xdb\x8c\xd8\x9f'
(at my machine).EDIT Sorry for the edit, but I can't comment (because not enough reputation)
Even on python 2.7, the
print(text)
does work. Check out this link here, which I just generated.The output will depend basically on which platform&terminal you run your code. Let's examine the below snippet for different windows terminals running either with 2.x or 3.x:
Results
Python 2.x
ConEmu v151205
Windows Command Prompt
Python 3.x
ConEmu v151205
Windows Command Prompt
As you can see just using sublime text3 terminal (case3) worked alright. The other terminals didn't support persian. The main point here is, it depends which terminal & platform you're using.
Solution (ConEmu specific)
Modern terminals like ConEmu allows you to work with UTF8-Encoding as explained here, so, let's try:
And then running again the script against 2.x & 3.x:
Python2.x
Python3.x
As you can see, now the output was succesfull with python3 case1 (print). So... moral of a fable... learn more about your tools and how to configure them properly for your use-cases ;-)