I saw that C++0x will add support for UTF-8, UTF-16 and UTF-32 literals. But what about conversions between the three representations ?
I plan to use std::wstring everywhere in my code. But I also need to manipulate UTF-8 encoded data when dealing with files and network. Will C++0x provide also support for these operations ?
In C++0x,
char16_t
andchar32_t
will be used to store UTF-16 and UTF-32 and notwchar_t
.From the draft n2798:
The thing about
wchar_t
is that it does not give you any guarantees about the encoding used. It is a type that can hold a multibyte character. Period. If you are going to write software now, you have to live with this compromise. C++0x compliant compilers are yet a far cry. You can always give the VC2010 CTP and g++ compilers a try for what it is worth. Moreover,wchar_t
has different sizes on different platforms which is another thing to watch out for (2 bytes on VS/Windows, 4 bytes on GCC/Mac and so on). There is then options like-fshort-wchar
for GCC to further complicate the issue.The best solution therefore is to use an existing library. Chasing UNICODE bugs around isn't the best possible use of effort/time. I'd suggest you take a look at:
More on C++0x Unicode string literals here
Thank you dirkgently. I'm not yet registered, so I can't upvote or respond directly as a comment.
I've learned something with codecvt. I knew about the libraries you suggest and the following resource may also be useful http://www.unicode.org/Public/PROGRAMS/CVTUTF/.
The project is for a library that should be open source. I would prefer minimizing the dependencies with external libraries. I already have a dependency with libgc and boost, though for the later I only use threads. I would really prefer to stick to the C++ standard and I'm a bit disappointed that GC supported has been somehow dropped.
Apparently VC++ express 2008 is said to support most of the C++0x standard as well as icc. Since I currently develop with VC++ and it will still take some time until the library would be released, I'd like to give a try to use codecvt and char32_t strings.
Does anyone know how to do this ? Should I post another question ?