Why some characters can not be typed in Python'

2019-01-19 20:30发布

问题:

I don't know how to explain this, actually I'm looking for the explanation, so I'll just mention some steps to reproduce the issue. Hopefully someone will be able to understand and elaborate:

  1. Python 3.5.0 on Windows 8.1. (However this should be reproducible regardless of Python and Windows version.)
  2. Having Persian standard keyboard Installed. (It can be downloaded from here. Again I'm sure the problem is not limited to this specific keyboard and there are some characters in some other languages that have the same problem. Just for the sake of reproducibility. )
  3. Open IDLE, set the keyboard's layout to Persian and type some characters.
  4. For some characters like 'آ' (Shift+h). They are typed perfectly fine.
  5. For some other characters like 'ی' (d). They are converted to a similar character, in this case 'ي' (notice the small dots under the glyph).
  6. There are some characters that can't be typed. For example '﷼' (Shift+4). These are typed as '?' in IDLE.
  7. All the above characters can be typed in almost any other program that I have installed. One of the simplest ones being notepad.exe.
  8. We can type the same characters in another program e.g. notepad.exe and then copy and paste them into IDLE. This shows that IDLE supports unicode characters, just can't type them.

I'm a fan of IDLE. It's lightweight IDE that is shipped with the standard Python installation and I don't want to switch to another IDE just because of this. But the above is the most annoying thing about IDLE for me. Whenever I need to write a program with some Persian characters in it, I can't trust IDLE to type them correctly and I have to open some other program and use the copy-paste method.

What I'm looking for is:

  • Why this happens? Where is the problem?
  • Are there any workarounds?
  • Any documentation or bug reports directly related to this issue.

Also this information may be helpful:

>>> import locale
>>> locale.getdefaultlocale()
('en_US', 'cp1256')
>>> locale.getpreferredencoding()
'cp1256'
>>> locale.getlocale()
('English_United States', '1252')
>>> 
>>> import sys
>>> sys.getdefaultencoding()
'utf-8'

Thanks.

Update:

Please see the first three comments below. It seems that this issue is caused by usage of WindowsBestFit mappings while typing in tkinter apps.

To test whether it's some bad configuration in python/tkinter bindings or tcl/tk itself, I downloaded and installed Tkabber. It's an application written in Tcl/Tk. Well, the exact same problem exists there i.e. I can't type the above characters but can copy and paste them. So my conclusion is that the root of the problem lies in tcl/tk itself and not IDLE/Python/tkinter.

My questions still hold.

回答1:

After some searching I found this ticket on Tk's bug tracker. That pretty much explains what's happening behind the scene. TCL/TK is internally using codepages to translate keyboard input to UTF-8.

Unfortunately there has been no activity around this bug since 2014-09-18 which is a sad thing. The bug has a huge impact on many languages, both those that have a Windows codepage (listed here) and even more on many others that don't have any codepage associated with them (like Bengali).

IMO, this should have been one of the highest priorities of TCL/TK development team. At its current state, users should not rely on Tcl/Tk for applications that require Unicode input support on Windows.