This question already has an answer here:
- Python, Unicode, and the Windows console 13 answers
I'm working on a python application that can print text in multiple languages to the console in multiple platforms. The program works well on all UNIX platforms, but in windows there are errors printing unicode strings in command-line.
There's already a relevant thread regarding this: ( Windows cmd encoding change causes Python crash ) but I couldn't find my specific answer there.
For example, for the following Asian text, in Linux, I can run:
>>> print u"\u5f15\u8d77\u7684\u6216".encode("utf-8")
引起的或
But in windows I get:
>>> print u"\u5f15\u8d77\u7684\u6216".encode("utf-8")
σ╝ץΦ╡╖τתהµטצ
I succeeded displaying the correct text with a message box when doing something like that:
>>> file("bla.vbs", "w").write(u'MsgBox "\u5f15\u8d77\u7684\u6216", 4, "MyTitle"'.encode("utf-16"))
>>> os.system("cscript //U //NoLogo bla.vbs")
But, I want to be able to do it in windows console, and preferably - without requiring too much configuration outside my python code (because my application will be distributed to many hosts).
Is this possible?
Edit: If it's not possible - I would be happy to accept some other suggestions of writing a console application in windows that displays unicode, e.g. a python implementation of an alternative windows console
Can you try using the program
iconv
on Windows, and piping your Python output through it? It'd go something like this:You might have to do a little work to get
iconv
on Windows--it's part of Cygwin but you may be able to build it separately somehow if needed.There's a WriteConsoleW solution that provides a unicode argv and stdout (print) but not stdin: Windows cmd encoding change causes Python crash
The only thing I modified is sys.argv to keep it unicode. The original version utf-8 encoded it for some reason.
The question is answered in the PrintFails article.
For Russia this means CP866, other countries use their own codepages too. This means that to read Python output in Windows console correctly you should have windows configuration with native codepage configured to display printed symbols.
I suggest you to always print Unicode text without any encoding to ensure maximum compatibility with various platforms.
If you try to print unprintable character you will get UnicodeEncodeError or see distorted text.
In some cases, if Python fails to determine output encoding correctly you might try to set PYTHONIOENCODING environment variable, do note however, that this probably won't work for your example, as your console is unable to present Asian text in current configuration.
To reconfigure console use Control Panel->Language and Regional settings->Advanced(tab)->Non Unicode programs language(section). Note that menu names are translated by me from Russian.
See also answers for the very similar question.
It merely comes from that cmd and powershell consoel do not support variable-width fonts. Fixed fonts do not have Chinese script included. Cygwin is in the same case.
Putty is more advanced, supporting variable-width fonts with cyrillic, vietnamese, arabic scripts, but no chinese so far.
HTH
Use a different console program. The following works in mintty, the default terminal emulator in Cygwin.
There are other console alternatives available for Windows but I have not assessed their Unicode support.