I am currently writing an application which requires me to call GetWindowText on arbitrary windows and store that data to a file for later processing. Long story short, I noticed that my tool was failing on Battlefield 3, and I narrowed the problem down to the following character in its window title:
http://www.fileformat.info/info/unicode/char/2122/index.htm
So I created a little test app which just does the following:
std::wcout << L"\u2122";
Low and behold that breaks output to the console window for the remainder of the program.
Why is the MSVC STL choking on this character (and I assume others) when APIs like MessageBoxW etc display it just fine?
How can I get those characters printed to my file?
Tested on both VC10 and VC11 under Windows 7 x64.
Sorry for the poorly constructed post, I'm tearing my hair out here.
Thanks.
EDIT:
Minimal test case
#include <fstream>
#include <iostream>
int main()
{
{
std::wofstream test_file("test.txt");
test_file << L"\u2122";
}
std::wcout << L"\u2122";
}
Expected result: '™' character printed to console and file.
Observed result: File is created but is empty. No output to console.
I have confirmed that the font I"m using for my console is capable of displaying the character in question, and the file is definitely empty (0 bytes in size).
EDIT:
Further debugging shows that the 'failbit' and 'badbit' are set in the stream(s).
EDIT:
I have also tried using Boost.Locale and I am having the same issue even with the new locale imbued globally and explicitly to all standard streams.
To write into a file, you have to set the locale correctly, for example if you want to write them as UTF-8 characters, you have to add
const std::locale utf8_locale
= std::locale(std::locale(), new std::codecvt_utf8<wchar_t>());
test_file.imbue(utf8_locale);
You have to add these 2 include files
#include <codecvt>
#include <locale>
To write to the console you have to set the console in the correct mode (this is windows specific) by adding
_setmode(_fileno(stdout), _O_U8TEXT);
(in case you want to use UTF-8).
For this you have to add these 2 include files:
#include <fcntl.h>
#include <io.h>
Furthermore you have to make sure that your are using a font that supports Unicode (such as for example Lucida Console). You can change the font in the properties of your console window.
The complete program now looks like this:
#include <fstream>
#include <iostream>
#include <codecvt>
#include <locale>
#include <fcntl.h>
#include <io.h>
int main()
{
const std::locale utf8_locale = std::locale(std::locale(),
new std::codecvt_utf8<wchar_t>());
{
std::wofstream test_file("c:\\temp\\test.txt");
test_file.imbue(utf8_locale);
test_file << L"\u2122";
}
_setmode(_fileno(stdout), _O_U8TEXT);
std::wcout << L"\u2122";
}
Are you always using std::wcout
or are you sometimes using std::cout
? Mixing these won't work. Of course, the error description "choking" doesn't say at all what problem you are observing. I'd suspect that this is a different problem to the one using files, however.
As there is no real description of the problem it takes somewhat of a crystal ball followed by a shot in the dark to hit the problem... Since you want to get Unicode characters from you file make sure that the file stream you are using uses a std::locale
whose std::codecvt<...>
facet actually converts to a suitable Unicode encoding.
I just tested GCC (versions 4.4 thru 4.7) and MSVC 10, which all exhibit this problem.
Equally broken is wprintf
, which does as little as the C++ stream API.
I also tested the raw Win32 API to see if nothing else was causing the failure, and this works:
#include <windows.h>
int main()
{
HANDLE stdout = GetStdHandle(STD_OUTPUT_HANDLE);
DWORD n;
WriteConsoleW( stdout, L"\u03B2", 1, &n, NULL );
}
Which writes β
to the console (if you set cmd's font to something like Lucida Console).
Conclusion: wchar_t
output is horribly broken in both large C++ Standard library implementations.
Although the wide character streams take Unicode as input, that's not what they produce as output - the characters go through a conversion. If a character can't be represented in the encoding that it's converting to, the output fails.