It is my understanding that by default, Character
is Latin_1, Wide_Character
is UCS-2, and Wide_Wide_Character
is UCS-4, but that GNAT can have specified pragma Wide_Character_Encoding(UTF8);
or -gnatW8
and that those characters and their strings will be UTF-8 encoded instead.
At least on Linux and FreeBSD, the results fit with my expectations. But on Windows the results are odd.
For either Wide or Wide_Wide variants, once a character moves beyond the ASCII set, I get a garbled mess. I beleive this is called emojibake by some. So I figured it was a codepage issue. After all, the default codepage in Windows, and therefore what the Console Host would load with, is 437 which isn't the UTF-8 codepage. chcp 65001
and now instead of the mess of extra characters, there's an immediate exception raised ADA.IO_EXCEPTIONS.DEVICE_ERROR : a-ztexio.adb:1295
. Looking at where the exception occurred, it seems to be in the putc
binding of fputc()
. But this is Standard_Output, shouldn't an EOF never happen?
Is there some kind of special consideration Windows needs? How can I get UTF-8 output?
edit:
I tried piping the output into a text file. The supposed UTF-8 encoded program still generates emojibake in the file. Not sure why this would immediately throw an exception in the console though.
So then I tried directly opening and writing to a file instead of the console/pipe. Oddly this works exactly as it should. The text is completely correct.
I've never seen this kind of behavior with any other language, so it should still be possible to get proper UTF-8 at the console, right?