可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

Using std::wstring the way I am with MultiByteToWideChar?

std::wstring widen(const std::string &in)
{
    int len = MultiByteToWideChar(CP_UTF8, 0, &in[0], -1, NULL, 0);
    std::wstring out(len, 0);
    MultiByteToWideChar(CP_UTF8, 0, &in[0], -1, &out[0], len);
    return out;
}

回答1:

If you're asking will it work, probably. Is it correct?

You should use in.c_str() instead of &in[0]
You should check the return value of MultiByteToWideChar at least the first time.
MultiByteToWideChar invoked with a (-1) length, if successful, will include accounting for a zero-terminator (i.e. it will always return >= 1 on success). The length-constructor for std::wstring does not require this. std::wstring(5,0) will allocate space for six wide-chars; 5+zero-term. So technically you're allocating one-too-many wide-chars.

From MultiByteToWideChar docs on cbMultiByte and -1:

If this parameter is -1, the function processes the entire input string, including the terminating null character. Therefore, the resulting Unicode string has a terminating null character, and the length returned by the function includes this character.

回答2:

There is a problem with your first call to MultiByteToWideChar: The character sequence is not guaranteed to be zero terminated (although in practice it usually is). Change that line to

int len = MultiByteToWideChar(CP_UTF8, 0, in.c_str(), -1, NULL, 0);

and you should be safe. Even if MultiByteToWideChar fails and returns 0 this is accounted for by passing len as the final parameter in the second call to MultiByteToWideChar.

With that said, it is safe in the sense that it doesn't crash or corrupt memory. There is, however, one more issue: Unless the input string causes MultiByteToWideChar to fail the returned string will claim that its size() is one character larger than it should be. I would recommend changing the code as follows:

std::wstring widen(std::string const &in)
{
    std::wstring out{};

    if (in.length() > 0)
    {
        // Calculate target buffer size (not including the zero terminator).
        int len = MultiByteToWideChar(CP_UTF8, MB_ERR_INVALID_CHARS,
                                      in.c_str(), in.size(), NULL, 0);
        if ( len == 0 )
        {
            throw std::runtime_error("Invalid character sequence.");
        }

        out.resize(len);
        // No error checking. We already know, that the conversion will succeed.
        MultiByteToWideChar(CP_UTF8, MB_ERR_INVALID_CHARS,
                            in.c_str(), in.size(), &out[0], out.size());
                            // Use out.data() in place of &out[0] for C++17
    }

    return out;
}

This implementation addresses the following issues:

It reports errors in case the input sequence is not valid UTF-8, by passing the MB_ERR_INVALID_CHARS flag.
Errors are reported by throwing exceptions. That makes it possible to distinguish between conversion errors and a successful call, that returns a zero-sized string. (Note: The std::wstring c'tor already throws exceptions in case of failure. It would feel unnatural to not throw exceptions for other errors.)
The implementation properly deals with input containing embedded NUL characters. This is rarely used, but when it is (e.g. when composing the OPENFILENAME's lpstrFilter member), it won't (silently) fail for that reason.
It doesn't over-allocate the return value's container storage. In case the cbMultiByte argument is set to -1 in a call to MultiByteToWideChar, the returned length does include space for the zero terminator. This character, however, is owned by the std::string implementation, and not part of the character sequence to be converted.
Related to the previous bullet point, this implementation doesn't convert the zero terminator. The original code did, and the returned string produces 2 NUL characters at the end of the string, when the c_str() member is invoked.

回答3:

No, since a std::wstring is not guaranteed to store it's data in a contiguous block of memory (though it most likely does in your implementation). Use a std::vector<wchar_t> instead.

回答4:

The other answers are good but I want to add some extra information for future visitors based on my own research into the same issue.

Microsoft developer, Larry Osterman, has a good blog post describing such a function with a very good point about the return code checking and NRVO (Named Return Value Optimization). You should read the post for discussion if it's still available. I'm including his final code just in case the post goes missing.


std::wstring UnicodeStringFromAnsiString(_In_ const std::string &ansiString)
{
    std::wstring returnValue;
    auto wideCharSize = MultiByteToWideChar(CP_ACP, MB_PRECOMPOSED, ansiString.c_str(), -1, nullptr, 0);
    if (wideCharSize == 0)
    {
        return returnValue;
    }
    returnValue.resize(wideCharSize);
    wideCharSize = MultiByteToWideChar(CP_ACP, MB_PRECOMPOSED, ansiString.c_str(), -1, &returnValue[0], wideCharSize);
    if (wideCharSize == 0)
    {
        returnValue.resize(0);
        return returnValue;
    }
    returnValue.resize(wideCharSize-1);
    return returnValue;
}

In my own usage, I was able to add the optimization mentioned in the blog comments and not need -1 for the ANSI string length.

C++17 (Section 21.3.1.7.1) documents a newly-added non-const data() method which should be used instead of &in[0] to get a mutable pointer.
```
charT* data() noexcept;
```
STL owns the trailing \0 in the c_str() results so be careful how you manipulate the string size.