String to LPCWSTR in c++

2019-05-05 19:40发布

问题:

I'm trying to convert from string to LPCWSTR (I use multi-bite).

1) For example:

LPCWSTR ToLPCWSTR(string text)
{
    LPCWSTR sw = (LPCWSTR)text.c_str();
    return sw;
}

2) This returns Chinese characters:

LPCWSTR ToLPCWSTR(string text)
{
    std::wstring stemp = std::wstring(text.begin(), text.end());
    LPCWSTR sw = (LPCWSTR)stemp.c_str();
    return sw;
}

However, they both always shows squares:

Image

EDITED: My code with an edit by: Barmak Shemirani

std::wstring get_utf16(const std::string &str, int codepage)
{
    if (str.empty()) return std::wstring();
    int sz = MultiByteToWideChar(codepage, 0, &str[0], (int)str.size(), 0, 0);
    std::wstring res(sz, 0);
    MultiByteToWideChar(codepage, 0, &str[0], (int)str.size(), &res[0], sz);
    return res;
}

string HttpsWebRequest(string domain, string url)
{
    LPCWSTR sdomain = get_utf16(domain, CP_UTF8).c_str();
    LPCWSTR surl = get_utf16(url, CP_UTF8).c_str();
    //(Some stuff...)
}

Return: https://i.gyazo.com/ea4cd50765bfcbe12c763ea299e7b508.png

EDITED: Using another code that pass from UTF8 to UTF16, still the same result.

std::wstring utf8_to_utf16(const std::string& utf8)
{
    std::vector<unsigned long> unicode;
    size_t i = 0;
    while (i < utf8.size())
    {
        unsigned long uni;
        size_t todo;
        bool error = false;
        unsigned char ch = utf8[i++];
        if (ch <= 0x7F)
        {
            uni = ch;
            todo = 0;
        }
        else if (ch <= 0xBF)
        {
            throw std::logic_error("not a UTF-8 string");
        }
        else if (ch <= 0xDF)
        {
            uni = ch & 0x1F;
            todo = 1;
        }
        else if (ch <= 0xEF)
        {
            uni = ch & 0x0F;
            todo = 2;
        }
        else if (ch <= 0xF7)
        {
            uni = ch & 0x07;
            todo = 3;
        }
        else
        {
            throw std::logic_error("not a UTF-8 string");
        }
        for (size_t j = 0; j < todo; ++j)
        {
            if (i == utf8.size())
                throw std::logic_error("not a UTF-8 string");
            unsigned char ch = utf8[i++];
            if (ch < 0x80 || ch > 0xBF)
                throw std::logic_error("not a UTF-8 string");
            uni <<= 6;
            uni += ch & 0x3F;
        }
        if (uni >= 0xD800 && uni <= 0xDFFF)
            throw std::logic_error("not a UTF-8 string");
        if (uni > 0x10FFFF)
            throw std::logic_error("not a UTF-8 string");
        unicode.push_back(uni);
    }
    std::wstring utf16;
    for (size_t i = 0; i < unicode.size(); ++i)
    {
        unsigned long uni = unicode[i];
        if (uni <= 0xFFFF)
        {
            utf16 += (wchar_t)uni;
        }
        else
        {
            uni -= 0x10000;
            utf16 += (wchar_t)((uni >> 10) + 0xD800);
            utf16 += (wchar_t)((uni & 0x3FF) + 0xDC00);
        }
    }
    return utf16;
}

回答1:

If std::string source is English or some Latin languages then conversion to std::wstring can be done with simple copy (as shown in Miles Budnek's answer). But in general you have to use MultiByteToWideChar

std::wstring get_utf16(const std::string &str, int codepage)
{
    if (str.empty()) return std::wstring();
    int sz = MultiByteToWideChar(codepage, 0, &str[0], (int)str.size(), 0, 0);
    std::wstring res(sz, 0);
    MultiByteToWideChar(codepage, 0, &str[0], (int)str.size(), &res[0], sz);
    return res;
}

You have to know the codepage used to make the source string. You can use GetACP() to find the codepage for user computer. If source string is UTF8 then use CP_UTF8 for codepage.



回答2:

You have two problems.

  1. LPCWSTR is a pointer to wchar_t, and std::string::c_str() returns a const char*. Those two types are different, so casting from const char* to LPCWSTR won't work.
  2. The memory pointed to by the pointer returned by std::basic_string::c_str is owned by the string object, and is freed when the string goes out of scope.

You will need to allocate memory and make a copy of the string.

The easiest way to allocate memory for a new wide string would be to just return a std::wstring. You can then pass the pointer returned by c_str() to whatever API function takes LPCWSTR:

std::wstring string_to_wstring(const std::string& text) {
    return std::wstring(text.begin(), text.end());
}