We have a project in Team Foundation Server (TFS) that has a non-English character (š) in it. When trying to script a few build-related things we've stumbled upon a problem - we can't pass the š letter to the command-line tools. The command prompt or what not else messes it up, and the tf.exe utility can't find the specified project.
I've tried different formats for the .bat file (ANSI, UTF-8 with and without BOM) as well as scripting it in JavaScript (which is Unicode inherently) - but no luck. How do I execute a program and pass it a Unicode command line?
Changing code page to 1252 is working for me. The problem for me is the symbol double doller § is converting to another symbol by DOS on Windows Server 2008.
I have used CHCP 1252 and a cap before it in my BCP statement ^§.
My background: I use Unicode input/output in a console for years (and do it a lot daily. Moreover, I develop support tools for exactly this task). There are very few problems, as far as you understand the following facts/limitations:
CMD
and “console” are unrelated factors.CMD.exe
is a just one of programs which are ready to “work inside” a console (“console applications”).CMD
has perfect support for Unicode; you can enter/output all Unicode chars when any codepage is active.chcp 65001
is very dangerous. Unless a program was specially designed to work around defects in the Windows’ API (or uses a C runtime library which has these workarounds), it would not work reliably. Win8 fixes ½ of these problems withcp65001
, but the rest is still applicable to Win10.cp1252
. As I already said: To input/output Unicode in a console, one does not need to set the codepage.The details
File-I/O
API, butConsole-I/O
API. (For an example, see how Python does it.)U+10000
). Only simple text rendering is supported (so European — and some East Asian — languages should work fine — as far as one uses precomposed forms). [There is a minor fine print here for East Asian and for characters U+0000, U+0001, U+30FB.]Practical considerations
The defaults on Window are not very helpful. For best experience, one should tune up 3 pieces of configuration:
One more gotcha with “Pasting” into a console application (very technical):
KeyUp
ofAlt
; all the other ways to deliver a character happen onKeyDown
; so many applications are not ready to see a character onKeyUp
. (Only applicable to applications usingConsole-I/O
API.)Ctrl-Alt-AltGr-Kana-Shift-Gray*
) then it is delivered on an emulated keypress. This is what any application expects — so pasting anything which contains only such characters is fine.Conclusion: unless your keyboard layout supports input of A LOT of characters without prefix keys, some buggy applications may skip characters when you
Paste
via Console’s UI:Alt-Space E P
. (This is why I recommend using my keyboard layouts!)One should also keep in mind that the “alternative, ‘more capable’ consoles” for Windows are not consoles at all. They do not support
Console-I/O
APIs, so the programs which rely on these APIs to work would not function. (The programs which use only “File-I/O APIs to the console filehandles” would work fine, though.)One example of such non-console is a part of MicroSoft’s
Powershell
. I do not use it; to experiment, press and releaseWinKey
, then typepowershell
.(On the other hand, there are programs such as
ConEmu
orANSICON
which try to do more: they “attempt” to interceptConsole-I/O
APIs to make “true console applications” work too. This definitely works for toy example programs; in real life, this may or may not solve your particular problems. Experiment.)Summary
set font, keyboard layout (and optionally, allow HEX input).
use only programs which go through
Console-I/O
APIs, and accept Unicode command-line arguments. For example, anycygwin
-compiled program should be fine. As I already said,CMD
is fine too.UPD: Initially, for a bug in
cp65001
, I was mixing up Kernel and CRTL layers (UPD²: and Windows user-mode API!). Also: Win8 fixes one half of this bug; I clarified the section about “better console” application, and added a reference to how Python does it.Actually, the trick is that the command prompt actually understands these non-english characters, just can't display them correctly.
When I enter a path in the command prompt that contains some non-english chracters it is displayed as "?? ?????? ?????". When you submit your command (cd "??? ?????? ?????" in my case), everything is working as expected.
A quick decision for .bat files if you computer displays your path/file name correct when you typing it in DOS-window:
This way you create a .txt file - temp.txt. Open it in Notepad, copy the text (don't worry it will look unreadable) and paste it in your .bat file. Executing the .bat created this way in DOS-window worked for mе (Cyrillic, Bulgarian).
For a similar problem, (my problem was to show UTF-8 characters from MySQL on a command prompt),
I solved it like this:
I changed the font of command prompt to Lucida Console. (This step must be irrelevant for your situation. It has to do only with what you see on the screen and not with what is really the character).
I changed the codepage to Windows-1253. You do this on the command prompt by "chcp 1253". It worked for my case where I wanted to see UTF-8.
As I haven't seen any full answers for Python 2.7, I'll outline the two important steps and an optional step that is quite useful.
Defaults
option. This also gives access to colours. Note that you can also change settings for command windows invoked in certain ways (e.g, open here, Visual Studio) by choosingProperties
instead.cp65001
, which appears to be Microsoft's attempt to offer UTF-7 and UTF-8 support to command prompt. Do this by runningchcp 65001
in command prompt. Once set, it remains this way until the window is closed. You'll need to redo this every time you launch cmd.exe.For a more permanent solution, refer to this answer on Super User. In short, create a
REG_SZ
(String) entry using regedit atHKEY_LOCAL_MACHINE\Software\Microsoft\Command Processor
and name itAutoRun
. Change the value of it tochcp 65001
. If you don't want to see the output message from the command, use@chcp 65001>nul
instead.Some programs have trouble interacting with this encoding, MinGW being a notable one that fails while compiling with a nonsensical error message. Nonetheless, this works very well and doesn't cause bugs with the majority of programs.