My end goal is to enable people that know the language Urdu and not English to be able to program in a Python environment.
Urdu is written left to right. I imagine having Urdu versions of all python keywords and using Urdu characters to define variable/function/class names.
Is this goal possible? If yes, then what all will need to be done?
I can already see one issue where the standard library functions would still be in English.
Update:
Whether this should be done or not is certainly a very interesting debate topic, which I'd love to talk about. However, is it actually possible and how?
I love Python and I know a lot of intelligent people that might never be taught English properly, only Urdu, hence the thought.
Chinese Python used to allow programmers to write source code entirely in Chinese, including translated module names, function names, and all keywords. Here's a code example from their website:
載入 系統 # import sys
文件名 = 系統.參數[1:] # filenames = sys.argv[1:]
定義 修正行尾(文件): # def fixline(file):
內文 = 打開(文件).讀入() # text = open(file).read()
內文 = 內文.替換('\n\r','\n') # text = text.replace('\n\r', '\n')
傳回 內文 # return text
取 文件 自 文件名: # for file in filenames:
寫 修正行尾(文件) # print fixline(file)
Sadly, it is no longer an active project, but you can get the source and find out what they changed to get an idea for your own implementation. (Learning why they failed may also help you understand what challenges you will have in making such a system successful).
Yes, it is possible. But it requires you to download Python from source, and change the source code and compile it. If I remember correctly from another discussion on the same topic, changing the grammar and regenerating some files is all it takes, except for when you are doing advanced stuff like meta-programming and things.
WHAT HAVE YOU TRIED? Python 3 allows you to use Unicode for all your function, class and variable names. Check it out, you just need to make sure you're using utf-8 for your script-- it's primarily a matter for your editor. Dealing gracefully with the mixed RTL/LTR issues is also your editor's problem. (The feature is discussed in PEP 3131)
The python language does not have "dialects", i.e., alternative sets of keywords. If you are not satisfied with Urdu identifiers and you are determined to have a completely Urdu-language experience, you could write a preprocessor that maps Urdu "keywords" to the corresponding (English) python keyword.
It shouldn't be too hard to wrap this kind of preprocessor around the interactive console and import modules, so that it works transparently to the user (and without recompiling python from source). If you can program and want to try this on for size, check out python's code
module. It's designed to read python source and send it on to be compiled and executed. You just step in and add a preprocessing step.
I'd use a preprocessor for keywords and built-ins. But providing localized wrappers for your choice of common modules is even easier, actually. Let's demonstrate with a wrapper module RE.py
that contains all exported identifiers of regular re
, but renamed to upper case:
import re as _re
for name in _re.__all__:
locals()[name.upper()] = _re.__dict__[name]
That was it! You can now import this module and use it:
import RE
docs = RE.SUB(r"\bEnglish\b", r"Urdu", docs)
Use Urdu words instead of uppercase (of course you need to specify each one individually) and you're on your way. As you say, whether all this is a good idea is a different question.