c++, cout and UTF-8

2020-02-12 04:15发布

问题:

Hopefully a simple question: cout seems to die when handling strings that end with a multibyte UTF-8 char, am I doing something wrong? This is with GCC (Mingw) on Win7 x64.

**Edit Sorry if I wasn't clear enough, I'm not concerned about the missing glyphs or how the bytes are interpreted, merely that they are not showing at all right after the call to cout << s4 (missing BAR). Any further couts after the first display no text whatsoever!

#include <cstdio>
#include <iostream>
#include <string>

int main() {
    std::string s1("abc");
    std::string s2("…");  // … = 0xE2 80 A6
    std::string s3("…abc");
    std::string s4("abc…");

    //In C
    fwrite(s1.c_str(), s1.size(), 1, stdout);
    printf(" FOO ");
    fwrite(s2.c_str(), s2.size(), 1, stdout);
    printf(" BAR ");
    fwrite(s3.c_str(), s3.size(), 1, stdout);
    printf(" FOO ");
    fwrite(s4.c_str(), s4.size(), 1, stdout);
    printf(" BAR\n\n"); 

    //C++
    std::cout << s1 << " FOO " << s2 << " BAR " << s3 << " FOO " << s4 << " BAR ";
}

// results:

// abc FOO ��� BAR ���abc FOO abc… BAR

// abc FOO ��� BAR ���abc FOO abc…

回答1:

This is really no surprise. Unless your terminal is set to UTF-8 coding, how does it know that s2 isn't supposed to be "(Latin small letter a with circumflex)(Euro sign)(Pipe)", supposing that your terminal is set to ISO-8859-1 according to http://www.ascii-code.com/

By the way, cout is not "dying" as it clearly continues to produce output after your test string.



回答2:

If you want your program to use your current locale, call setlocale(LC_ALL, "") as the first thing in your program. Otherwise the program's locale is C and what it will do to non-ASCII characters is not knowable by us mere humans.



回答3:

The Windows console does not handle non-local-codepage characters by default.

You'll need to make sure you have a Unicode-capable font set in the console window, and that the codepage is set to UTF-8 through a call to chcp. This is not a guaranteed success though. Note that `wcout´ changes nothing if the console can't show the fancy characters because its font is botched.

On all modern Linux distros, the console is set to UTF-8 and this should work out of the box.



回答4:

As others have pointed out, std::cout is agnostic about this, at least in "C" locale (the default). On the other hand, your console window must be set up to display UTF-8: code page 65001. Try invoking chcp 65001 before executing your program. (This has worked for me in the past.)



标签: c++ utf-8 cout