The following code shows unexpected behaviour on my machine (tested with Visual C++ 2008 SP1 on Windows XP and VS 2012 on Windows 7):
#include <iostream>
#include "Windows.h"
int main() {
SetConsoleOutputCP( CP_UTF8 );
std::cout << "\xc3\xbc";
int fail = std::cout.fail() ? '1': '0';
fputc( fail, stdout );
fputs( "\xc3\xbc", stdout );
}
I simply compiled with cl /EHsc test.cpp
.
Windows XP: Output in a console window is
ü0ü
(translated to Codepage 1252, originally shows some line drawing
charachters in the default Codepage, perhaps 437). When I change the settings
of the console window to use the "Lucida Console" character set and run my
test.exe again, output is changed to 1ü
, which means
- the character
ü
can be written usingfputs
and its UTF-8 encodingC3 BC
std::cout
does not work for whatever reason- the streams
failbit
is setting after trying to write the character
Windows 7: Output using Consolas is ��0ü
. Even more interesting. The correct bytes are written, probably (at least when redirecting the output to a file) and the stream state is ok, but the two bytes are written as separate characters).
I tried to raise this issue on "Microsoft Connect" (see here), but MS has not been very helpful. You might as well look here as something similar has been asked before.
Can you reproduce this problem?
What am I doing wrong? Shouldn't the std::cout
and the fputs
have the same
effect?
SOLVED: (sort of) Following mike.dld's idea I implemented a std::stringbuf
doing the conversion from UTF-8 to Windows-1252 in sync()
and replaced the streambuf of std::cout
with this converter (see my comment on mike.dld's answer).
I understand the question is quite old, but if someone would still be interested, below is my solution. I've implemented a quite simple std::streambuf descendant and then passed it to each of standard streams on the very beginning of program execution.
This allows you to use UTF-8 everywhere in your program. On input, data is taken from console in Unicode and then converted and returned to you in UTF-8. On output the opposite is done, taking data from you in UTF-8, converting it to Unicode and sending to console. No issues found so far.
Also note, that this solution doesn't require any codepage modification, with either
SetConsoleCP
,SetConsoleOutputCP
orchcp
, or something else.That's the stream buffer:
The usage then is as follows:
Left out
wstring_to_utf8
andutf8_to_wstring
could easily be implemented withWideCharToMultiByte
andMultiByteToWideChar
WinAPI functions.Oi. Congratulations on finding a way to change the code page of the console from inside your program. I didn't know about that call, I always had to use chcp.
I'm guessing the C++ default locale is getting involved. By default it will use the code page provide by GetThreadLocale() to determine the text encoding of non-wstring stuff. This generally defaults to CP1252. You could try using SetThreadLocale() to get to UTF-8 (if it even does that, can't recall), with the hope that std::locale defaults to something that can handle your UTF-8 encoding.
It's time to close this now. Stephan T. Lavavej says the behaviour is "by design", although I cannot follow this explanation.
My current knowledge is: Windows XP console in UTF-8 codepage does not work with C++ iostreams.
Windows XP is getting out of fashion now and so does VS 2008. I'd be interested to hear if the problem still exists on newer Windows systems.
On Windows 7 the effect is probably due to the way the C++ streams output characters. As seen in an answer to Properly print utf8 characters in windows console, UTF-8 output fails with C stdio when printing one byte after after another like
putc('\xc3'); putc('\xbc');
as well. Perhaps this is what C++ streams do here.