Unicode support in C++0x

2019-02-11 21:16发布

I'm trying to use new unicode characters in C++0x. So I wrote sample code:

#include <fstream>
#include <string>
int main()
{
    std::u32string str = U"Hello World";

    std::basic_ofstream<char32_t> fout("output.txt");

    fout<<str;  
    return 0;
}

But after executing this program I'm getting empty output.txt file. So why it's not printing Hello World?

Also is there something like a cout and cin already defined for these types, or stdin and stdout doesn't support Unicode?

Edit: I'm using g++ and Linux.

EDIT:АТТЕNTION. I have discovered, that standard committee dismissed Unicode streams from C++0x. So previously accepted answer is not correct anymore. For more information see my answer!

3条回答
做个烂人
2楼-- · 2019-02-11 22:00

Unicode string literals support began in GCC 4.5. Maybe that's the problem.

[edit]

After some digging I've found that streams for this new unicode literals are described in N2035 and it was included in a draft of the standard. According to this document you need u32ofstream to output you string but this class is absent in GCC 4.5 C++0x library.

As a workaround you can use ordinary fstream:

std::ofstream fout2("output2.txt", std::ios::out | std::ios::binary);
fout2.write((const char *)str.c_str(), str.size() * 4);

This way I've output your string in UTF-32LE on my Intel machine (which is little-endian).

[edit]

I was a little bit wrong about the status of u32ofstream: according to the latest draft on the The C++ Standards Committee's web site you have to use std::basic_ofstream<char32_t> as you did. This class would use codecvt<char32_t,char,typename traits::state_type> class (see end of §27.9.1.1) which has to be implemented in the standard library (search codecvt<char32_t in the document), but it's not available in GCC 4.5.

查看更多
兄弟一词,经得起流年.
3楼-- · 2019-02-11 22:06

In new C++ standard there will not be Unicode streams.

As @ssmir mentioned, standard committee was going to add stream support for Unicode in C++0x. However in the feature editions committee decided to remove stream support for Unicode. For more information see this link.

It seams like the only way to output Unicode string is to convert it to ASCII string with codecvt .

查看更多
可以哭但决不认输i
4楼-- · 2019-02-11 22:06

When creating, the stream tries to obtain a 'codecvt' from the global locale, but fails to get one because the only standard codecvt's are for char and wchar_t. As a result, _M_codecvt member of the stream object is NULL. Later, during the attempt to output, your code throws an exception (not visible to user) in facet checking function in basic_ios.h, because the facet is initialized from _M_codecvt.

Add a facet to the local associated with the stream to do the conversion from char32_t to the correct output. Imbue the stream with a locale containing a codecvt of the right type.

查看更多
登录 后发表回答