Recent times I am coming across the conversion of UTF-8 encoding to string and vice vera. I understood that UTF-8 encoding is used to hold almost all the characters in the world while using char which is built in data type for string, only ASCII values can be stored.For a character in UTF-8 encoding the number of bytes required in memory is varied from one byte to 4 bytes but for 'char' type it is usually 1 byte.
My question is what happens in conversion from wstring to string or wchar to char ? Does the characters which require more than one byte is skipped? It seems it depends on implementation but I want to know what is the correct way of doing it.
Also does wchar is required to store unicode characters ? As far as I understood UNICODE characters can be stored in normal string as well. Why should we use wstring or wchar ?
Make your source files UTF-8 encoded, set the character encoding to UNICODE in your IDE.
Use std::string and widen them for WindowsAPI calls.
std::string somestring = "こんにちは"; WindowsApiW(widen(somestring).c_str());
I know it sounds kind of hacky but a more profound explaination of this issue can be found at utf8everywhere.org.
Depends how you convert them.
You need to specify the source encoding type and the target encoding type.
wstring
is not a format, it just defines a data type.Now usually when one says "Unicode", one means
UTF16
which is what Microsoft Windows uses, and that is usuasly whatwstring
contains.So, the right way to convert from UTF8 to UTF16:
And the other way around:
And to add to the confusion:
When you use
std::string
on a windows platform (like when you use a multibyte compilation), It's NOT UTF8. They use ANSI.More specifically, the default encoding language your windows is using.
When compiling in Unicode the windows API commands expect these formats:
CommandA - multibyte -
ANSI
CommandW - Unicode -
UTF16