Converting ASCII strings to UTF-16 before passing

2019-05-28 05:01发布

In my current project I've been using wide chars (utf16). But since my only input from the user is going to be a url, which has to end up ascii anyways, and one other string, I'm thinking about just switching the whole program to ascii.

My question is, is there any benefit to converting the strings to utf16 before I pass them to a Windows API function?

After doing some research online, it seems like a lot of people recommend this if your not working with UTF-16 on windows.

2条回答
唯我独甜
2楼-- · 2019-05-28 05:28

The main point is that on Windows UTF-16 is the native encoding and all API functions that end in A are just wrappers around the W ones. The A functions are just carried around as compatibility to programs that were written for Windows 9x/ME and indeed, no new program should ever use them (in my opinion).

Unless you're doing heavy processing of billions of large strings I doubt there is any benefit to thinking about storing them in another (possibly more space-saving) encoding at all. Besides, even an URI can contain Unicode, if you think about IDN. So don't be too sure upfront about what data your users will pass to the program.

查看更多
爷、活的狠高调
3楼-- · 2019-05-28 05:30

In the Windows API, if you call a function like

int SomeFunctionA(const char*);

then it will automatically convert the string to UTF-16 and call the real, Unicode version of the function:

int SomeFunctionW(const wchar_t*);

The catch is, it converts the string to UTF-16 from the ANSI code page. That works OK if you actually have strings encoded in the ANSI code page. It doesn't work if you have strings encoded in UTF-8, which is increasingly common these days (e.g., nearly 70% of Web pages), and isn't supported as an ANSI code page.

Also, if you use the A API, you'll run into limitations like not (easily) being able to open files that have non-ANSI characters in their names (which can be arbitrary UTF-16 strings). And won't have access to some of Windows' newer features.

Which is why I always call the W functions. Even though this means annoying explicit conversions (from the UTF-8 strings used in the non-Windows-specific parts of our software).

查看更多
登录 后发表回答