iostreams - Print `wchar_t` or `charXX_t` value as

2019-06-18 05:37发布

问题:

If you feed a wchar_t, char16_t, or char32_t value to a narrow ostream, it will print the numeric value of the code point.

#include <iostream>
using std::cout;
int main()
{
    cout << 'x' << L'x' << u'x' << U'x' << '\n';
}

prints x120120120. This is because there is an operator<< for the specific combination of basic_ostream with its charT, but there aren't analogous operators for the other character types, so they get silently converted to int and printed that way. Similarly, non-narrow string literals (L"x", u"x", U"X") will be silently converted to void* and printed as the pointer value, and non-narrow string objects (wstring, u16string, u32string) won't even compile.

So, the question: What is the least awful way to print a wchar_t, char16_t, or char32_t value on a narrow ostream, as the character, rather than as the numeric value of the codepoint? It should correctly convert all codepoints that are representable in the encoding of the ostream, to that encoding, and should report an error when the codepoint is not representable. (For instance, given u'…' and a UTF-8 ostream, the three-byte sequence 0xE2 0x80 0xA6 should be written to the stream; but given u'â' and a KOI8-R ostream, an error should be reported.)

Similarly, how can one print a non-narrow C-string or string object on a narrow ostream, converting to the output encoding?

If this can't be done within ISO C++11, I'll take platform-specific answers.

(Inspired by this question.)

回答1:

As you noted, there is no operator<<(std::ostream&, const wchar_t) for a narrow ostream. If you want to use the syntax you can however teach ostream how to do with wchars so that that routine is picked as a better overload that the one requiring a conversion to an integer first.

If you're feeling adventurous:

namespace std {
  ostream& operator<< (ostream& os, wchar_t wc) {
    if(unsigned(wc) < 256) // or another upper bound
      return os << (unsigned char)wc;
    else
      throw your_favourite_exception; // or handle the error in some other way
  }
}

Otherwise, make a simple struct that transparently encompasses a wchar_t and has a custom friend operator<< and convert your wide characters to that before outputting them.

Edit: To make an on-the-fly conversion to and from the locale, you can use the functions from <cwchar>, like:

ostream& operator<< (ostream& os, wchar_t wc) {
    std::mbstate_t state{};
    std::string mb(MB_CUR_MAX, '\0');
    size_t ret = std::wcrtomb(&mb[0], wc, &state);
    if(ret == static_cast<std::size_t>(-1))
        deal_with_the_error();
    return os << mb;
}

Don't forget to set your locale to the system default:

std::locale::global(std::locale(""));
std::cout << L'ŭ';