It is possible to write Unicode characters to the Windows console using the WriteConsoleW
function. On my Windows 7 machine, it looks like the console does not support characters outside the Basic Multilingual Plane. Also, combining characters are displayed after the base character, not actually combined.
Are these limitations also present in later versions of Windows? Are there other limitations on Unicode in the Windows console?
Windows console is limited to Basic Multilingual Plane
Your link to WriteConsole function says nothing about usable console characters:
- lpBuffer [in] A pointer to a buffer that contains characters to be written to the console screen buffer.
But what is that buffer? Simple Google search for writeconsole lpbuffer structure gives (indirect) link to the CHAR_INFO structure:
Syntax (C++)
typedef struct _CHAR_INFO {
union {
WCHAR UnicodeChar;
CHAR AsciiChar;
} Char;
WORD Attributes;
} CHAR_INFO, *PCHAR_INFO;
But what is WCHAR UnicodeChar
? Again, a simple Google search for windows wchar gives link to Windows Data Types:
WCHAR
A 16-bit Unicode character. For more information, see Character Sets Used By Fonts. This type is declared in WinNT.h
as follows: typedef wchar_t WCHAR;
And finally, above Character Sets Used By Fonts link gives next ultimate consequence: Windows console is limited to Basic Multilingual Plane, i.e. 16-bit Unicode subset:
Unicode Character Set
… To address the problem of multiple coding schemes, the Unicode
standard for data representation was developed. A 16-bit character
coding scheme, Unicode can represent 65,536 (2^16) characters, which
is enough to include all languages in computer commerce today, as well
as punctuation marks, mathematical symbols, and room for expansion.
Unicode establishes a unique code for every character to ensure that
character translation is always accurate.
I wrote a partial answer in my answer to a different question; here is a good place for a full disclosure. My background: I maintain what is in all probability the most extensive console font which fully supports Windows (it is a very deep rewrite of Unifont with elements of DejaVu added).
I start with the limitations already mentioned in other answers:
Every cell contains 16 bits of character data. In other words: only UCS-2 codepoints are shown. (In particular, for a character out of BMP, its “decomposition into UCS-2” is shown instead, using surrogate characters.)
only simple text rendering is supported. Even if one uses TTF fonts, no advanced “features” of the font are considered by the console. Neither advance typography (ligatures etc.), nor even composing glyphs for composing characters or right-to-left scripts¹⁾ (in LtR environment) would work as expected.
¹⁾ It is the application which should rearrange the characters for a correct bidi-rendering.
Font filtering
Other limitations are due to font filtering by a console. A font must be quite special to be accepted by the console (be shown in the font selection dialogue, and this selection “to work”¹⁾).
¹⁾ I do not recall whether a font may be shown, but won’t be selectable (I have vague memory of this happening, but cannot trust this memory).
The font must be marked as monospaced. Due to expectations of applications,²⁾ such fonts must have all the glyphs of the same width.
²⁾The latter condition is relevant only if one wants to use the font outside of console. In principle, the console does not check the widths of the glyphs. However, every glyph is shown as if it had the “default width”. In many (all?) situations only the part of the glyph inside the “default bounding box” is going to be shown. I could not find any trick to circumvent this limitation.
On non-EastAsian releases of Windows, the font cannot claim that it supports any one of 4 East Asian codepages.³⁾
³⁾ Note that this is only a limitation of what the font header claims — it is just 4 bits present in the header. The font may have glyphs for these languages present, and they would show fine — as far as the font does not claim the support. The codepages in question (in the OS/2⫽Charsets section of the header) are 932, 936, 949, 950 (JIS, Simplified Chinese, Korean Wansung, Traditional Chinese).
Bugs in font rendering
Although Windows’ console does not support Underline
attribute (except for DBCS codepages), the “Underline position
” field of the font header is taken into account when the size of the on-screen character bbox is calculated. This may lead to unexpected aspect ratio of the font, and/or to interruptions between glyphs which are expected to “join together”.
The console is very picky about the replacement glyph for “unsupported characters”. I could not find how to make such a glyph to coexist with presence of glyphs for U+0000
and/or U+0001
. (If the console finds one of the latter two glyphs in a font, it ignores the replacement glyph.)
(This is a very obscure bug; it requires a very technical discussion.) Another problem with the replacement glyph is the character U+30FB
・ (WHY?!). If this character is present in the font, the glyph for this character is used as a replacement glyph — but only for missing characters in PUA!
Essentially, this is it! I did not find any other limitation.