I'm buiding an API that allows me to fetch strings in various encodings, including utf8, utf16, utf32 and wchar_t (that may be utf32 or utf16 according to OS).
New C++ standard had introduced new types char16_t
and char32_t
that do not have this sizeof ambiguity and should be used in future, so I would like to support them as well, but the question is, would they interfere with normal uint16_t
, uint32_t
, wchar_t
types not allowing overload because they may refer to same type?
class some_class {
public:
void set(std::string); // utf8 string
void set(std::wstring); // wchar string utf16 or utf32 according
// to sizeof(wchar_t)
void set(std::basic_string<uint16_t>)
// wchar independent utf16 string
void set(std::basic_string<uint32_t>);
// wchar independent utf32 string
#ifdef HAVE_NEW_UNICODE_CHARRECTERS
void set(std::basic_string<char16_t>)
// new standard utf16 string
void set(std::basic_string<char32_t>);
// new standard utf32 string
#endif
};
So I can just write:
foo.set(U"Some utf32 String");
foo.set(u"Some utf16 string");
What are the typedef of std::basic_string<char16_t>
and std::basic_string<char32_t>
as there is today:
typedef basic_string<wchar_t> wstring.
I can't find any reference.
Edit: according to headers of gcc-4.4, that introduced these new types:
typedef basic_string<char16_t> u16string;
typedef basic_string<char32_t> u32string;
I just want to make sure that this is actual standard requirement and not gcc-ism.
1) char16_t
and char32_t
will be distinct new types, so overloading on them will be possible.
Quote from ISO/IEC JTC1 SC22 WG21 N2018:
Define char16_t
to be a typedef to a
distinct new type, with the name
_Char16_t
that has the same size and representation as uint_least16_t
.
Likewise, define char32_t
to be a
typedef to a distinct new type, with
the name _Char32_t
that has the same
size and representation as
uint_least32_t
.
Further explanation (from a devx.com article "Prepare Yourself for the Unicode Revolution"):
You're probably wondering why the
_Char16_t
and _Char32_t
types and keywords are needed in the first place
when the typedefs uint_least16_t
and
uint_least32_t
are already available.
The main problem that the new types
solve is overloading. It's now
possible to overload functions that
take _Char16_t
and _Char32_t
arguments, and create specializations
such as std::basic_string<_Char16_t>
that are distinct from
std::basic_string <wchar_t>
.
2) u16string
and u32string
are indeed part of C++0x and not just GCC'isms, as they are mentioned in various standard draft papers. They will be included in the new <string>
header. Quote from the same article:
The Standard Library will also provide
_Char16_t
and _Char32_t
typedefs, in analogy to the typedefs wstring
,
wcout
, etc., for the following standard classes:
filebuf, streambuf, streampos, streamoff, ios, istream, ostream, fstream,
ifstream, ofstream, stringstream, istringstream, ostringstream,
string