Oftentimes when I'm working interactively in IDLE, I'd like to paste a Unicode string into the IDLE window. It appears to paste properly but generates an error immediately. It has no trouble displaying the same character on output.
>>> c = u'ĉ'
Unsupported characters in input
>>> print u'\u0109'
ĉ
I suspect that the input window, like most Windows programs, uses UTF-16 internally and has no trouble dealing with the full Unicode set; the problem is that IDLE insists on coercing all input to the default mbcs
code page, and anything not in that page gets rejected.
Is there any way to configure or cajole IDLE into accepting the full Unicode character set as input?
Python 3.2 handles this much better and has no trouble with anything I throw at it.
I know that I can simply save the code to a file in UTF-8 and import it, but I want to be able to work with Unicode characters in the interactive window.
I finally figured out a way. Since the sources to IDLE are part of the distribution you can make a couple of quick edits to enable the capability. The files will typically be found in C:\Python27\Lib\idlelib
.
The first step is to prevent IDLE from trying to encode all those nice Unicode characters into a character set that can't handle them. This is controlled by IOBinding.py
. Edit the file, find the section after if sys.platform == 'win32':
and comment out this line:
#encoding = locale.getdefaultlocale()[1]
Now add this line after it:
encoding = 'utf-8'
I was hoping that there would be a way to override this with an environment variable or something, but getdefaultlocale
calls directly into a Win32 function that gets the globally configured Windows mbcs encoding.
This is half the battle, now we need to get the command line interpreter to recognize that the input bytes are UTF-8 encoded. It didn't appear that there was a way to pass an encoding into the interpreter, so I came up with the mother of all hacks. Maybe someone with a little more patience can come up with a better way, but this works for now. The input is processed in PyShell.py
, in the runsource
function. Change the following:
if isinstance(source, types.UnicodeType):
from idlelib import IOBinding
try:
source = source.encode(IOBinding.encoding)
except UnicodeError:
self.tkconsole.resetoutput()
self.write("Unsupported characters in input\n")
return
To:
from idlelib import IOBinding # line moved
if isinstance(source, types.UnicodeType):
try:
source = source.encode(IOBinding.encoding)
except UnicodeError:
self.tkconsole.resetoutput()
self.write("Unsupported characters in input\n")
return
source = "#coding=%s\n%s" % (IOBinding.encoding, source) # line added
We're taking advantage of PEP 263 to specify the encoding for each line of input provided to the interpreter.
Update: In Python 2.7.10 it is no longer necessary to make the change in PyShell.py
, it already works properly if the encoding is set to utf-8
. Unfortunately I haven't found a way to bypass the change in IOBinding.py
.