I use ReadDirectoryChangesW for reading changes in current folder. And I get path to this file: L"TEST Ӡ⬨☐.ipt":
Next, I want to convert this to utf8 and back:
std::string wstringToUtf8(const std::wstring& source) {
const int size = WideCharToMultiByte(CP_UTF8, 0, source.data(), static_cast<int>(source.size()), NULL, 0, NULL, NULL);
std::vector<char> buffer8(size);
WideCharToMultiByte(CP_UTF8, 0, source.data(), static_cast<int>(source.size()), buffer8.data(), size, NULL, NULL);
}
std::wstring utf8ToWstring(const std::string& source) {
const int size = MultiByteToWideChar(CP_UTF8, 0, source.data(), static_cast<int>(source.size()), NULL, 0);
std::vector<wchar_t> buffer16(size);
MultiByteToWideChar(CP_UTF8, 0, source.data(), static_cast<int>(source.size()), buffer16.data(), size);
}
int main() {
// Some code with ReadDirectoryChangesW and
// ...
// std::wstring fileName = "L"TEST Ӡ⬨☐.ipt""
// ...
std::string filenameUTF8 = wstringToUtf8(fileName);
std::wstring filename2 = utf8ToWstring(filenameUTF8);
assert(filenameUTF8 == filename2); // FAIL!
return 0;
}
But I catch assert. filename2:
Different bits: [29]
Why?
57216 seems to fall in to surrogate pair range, used in UTF-16 to encode non-BMP code points. They need to be given in pairs, or decoding won't give you correct codepoint.
65533 is a special error character which decoder gives because other surrogate is missing.
To put it another way: Your original string is not valid UTF-16 string.
More info on Wikipedia.