C++11 has tools to convert wide char strings std::wstring
from/to utf8 representation: std::codecvt
, std::codecvt_utf8
, std::codecvt_utf8_utf16
etc.
Which one is usable by Windows app to convert regular wide char Windows strings std::wstring
to utf8 std::string
? Is it always works without configuring locales?
Depends how you convert them.
You need to specify the source encoding type and the target encoding type.
wstring
is not a format, it just defines a data type.Now usually when one says "Unicode", one means
UTF16
which is what Microsoft Windows uses, and that is usuasly whatwstring
contains.So, the right way to convert from UTF8 to UTF16:
And the other way around:
And to add to the confusion:
When you use
std::string
on a windows platform (like when you use a multibyte compilation), It's NOT UTF8. They use ANSI.More specifically, the default encoding language your windows is using.
Also, note that wstring is not exactly the same as UTF-16.
When compiling in Unicode the windows API commands expect these formats:
CommandA - multibyte -
ANSI
CommandW - Unicode -
UTF16
Seems that
std::codecvt_utf8
works well for conversionstd::wstring
->utf8
. It passed all my tests. (Windows app, Visual Studio 2015, Windows 8 with EN locale)I needed a way to convert filenames to UTF8. Therefore my test is about filenames.
In my app I use
boost::filesystem::path
1.60.0 to deal with file path. It works well, but not able to convert filenames to UTF8 properly. Internally Windows version ofboost::filesystem::path
usesstd::wstring
to store the file path. Unfortunately, build-in conversion tostd::string
works bad.Test case:
c:\test\皀皁皂皃的
(some random Asian symbols)boost::filesystem::directory_iterator
, getboost::filesystem::path
for the filestd::string
via build-in conversionfilenamePath.string()
c:\test\?????
. Asian symbols converted to '?'. Not good.boost::filesystem
usesstd::codecvt
internally. It doesn't work for conversionstd::wstring
->std::string
.Instead of build-in
boost::filesystem::path
conversion you can define conversion function as this (original snippet):Then you can convert filepath to UTF8 easily:
utf8_to_wstring(filenamePath.wstring())
. It works perfectly.It works for any filepath. I tested ASCII strings
c:\test\test_file
, Asian stringsc:\test\皀皁皂皃的
, Russian stringsc:\test\абвгд
, mixed stringsc:\test\test_皀皁皂皃的
,c:\test\test_абвгд
,c:\test\test_皀皁皂皃的_абвгд
. For every string I receive valid UTF8 representation.