From c++2003 2.13
A wide string literal has type “array of n const wchar_t” and has static storage duration, where n is the size of the string as defined below
The size of a wide string literal is the total number of escape sequences, universal-character-names, and other characters, plus one for the terminating L’\0’.
From c++0x 2.14.5
A wide string literal has type “array of n const wchar_t”, where n is the size of the string as defined below
The size of a char32_t or wide string literal is the total number of escape sequences, universal-character-names, and other characters, plus one for the terminating U’\0’ or L’\0’.
The size of a char16_t string literal is the total number of escape sequences, universal-character-names, and other characters, plus one for each character requiring a surrogate pair, plus one for the terminating u’\0’.
The statement in C++2003 is quite vague. But in C++0x, when counting the length of the string, the wide string literal wchar_t shall be treated as same as char32_t, and different from char16_t.
There's a post that states clearly how windows implements wchar_t in https://stackoverflow.com/questions/402283?tab=votes%23tab-top
In short, wchar_t in windows is 16bits and encoded using UTF-16. The statement in standard apparently leaves something conflicting in Windows.
for example,
wchar_t kk[] = L"\U000E0005";
This exceeds 16bits and for UTF-16 it needs two 16 bits to encode it (a surrogate pair).
However, from standard, kk is an array of 2 wchar_t (1 for the universal-name \U000E005, 1 for \0).
But in the internal storage, Windows need 3 16-bit wchar_t objects to store it, 2 wchar_t for the surrogate pair, and 1 wchar_t for the \0. Therefore, from array's definition, kk is an array of 3 wchar_t.
It's apparently conflicting to each other.
I think one simplest solution for Windows is to "ban" anything that requires surrogate pair in wchar_t ("ban" any unicode outside BMP).
Is there anything wrong with my understanding?
Thanks.