How do I print UTF-8 from c++ console application

2019-01-03 10:26发布

For a C++ console application compiled with Visual Studio 2008 on English Windows (XP,Vista or 7). Is it possible to print out to the console and correctly display UTF-8 encoded Japanese using cout or wcout?

7条回答
狗以群分
2楼-- · 2019-01-03 10:40

Just for additional information:

'ANSI' refers to windows-125x, used for win32 applications while 'OEM' refers to the code page used by console/MS-DOS applications.
Current active code-pages can be retrieved with functions GetOEMCP() and GetACP().

In order to output something correctly to the console, you should:

  1. ensure the current OEM code page supports the characters you want to output
    (if necessary, use SetConsoleOutputCP to set it properly)

  2. convert the string from current ANSI code (win32) to the console OEM code page

Here are some utilities for doing so:

// Convert a UTF-16 string (16-bit) to an OEM string (8-bit) 
#define UNICODEtoOEM(str)   WCHARtoCHAR(str, CP_OEMCP)

// Convert an OEM string (8-bit) to a UTF-16 string (16-bit) 
#define OEMtoUNICODE(str)   CHARtoWCHAR(str, CP_OEMCP)

// Convert an ANSI string (8-bit) to a UTF-16 string (16-bit) 
#define ANSItoUNICODE(str)  CHARtoWCHAR(str, CP_ACP)

// Convert a UTF-16 string (16-bit) to an ANSI string (8-bit)
#define UNICODEtoANSI(str)  WCHARtoCHAR(str, CP_ACP)


/* Convert a single/multi-byte string to a UTF-16 string (16-bit).
 We take advantage of the MultiByteToWideChar function that allows to specify the charset of the input string.
*/
LPWSTR CHARtoWCHAR(LPSTR str, UINT codePage) {
    size_t len = strlen(str) + 1;
    int size_needed = MultiByteToWideChar(codePage, 0, str, len, NULL, 0);
    LPWSTR wstr = (LPWSTR) LocalAlloc(LPTR, sizeof(WCHAR) * size_needed);
    MultiByteToWideChar(codePage, 0, str, len, wstr, size_needed);
    return wstr;
}


/* Convert a UTF-16 string (16-bit) to a single/multi-byte string.
 We take advantage of the WideCharToMultiByte function that allows to specify the charset of the output string.
*/
LPSTR WCHARtoCHAR(LPWSTR wstr, UINT codePage) {
    size_t len = wcslen(wstr) + 1;
    int size_needed = WideCharToMultiByte(codePage, 0, wstr, len, NULL, 0, NULL, NULL);
    LPSTR str = (LPSTR) LocalAlloc(LPTR, sizeof(CHAR) * size_needed );
    WideCharToMultiByte(codePage, 0, wstr, len, str, size_needed, NULL, NULL);
    return str;
}
查看更多
可以哭但决不认输i
3楼-- · 2019-01-03 10:48

The Windows console uses the OEM code page by default to display output.

To change the code page to Unicode enter chcp 65001 in the console, or try to change the code page programmatically with SetConsoleOutputCP.

Note that you probably have to change the font of the console to one that has glyphs in the unicode range.

查看更多
贪生不怕死
4楼-- · 2019-01-03 10:53

Here's an article from MVP Michael Kaplan on how to correctly output UTF-16 through the console. You could convert your UTF-8 to UTF-16 and output that.

查看更多
萌系小妹纸
5楼-- · 2019-01-03 10:57

This should work:

#include <cstdio>
#include <windows.h>

#pragma execution_character_set( "utf-8" )

int main()
{
    SetConsoleOutputCP( 65001 );
    printf( "Testing unicode -- English -- Ελληνικά -- Español -- Русский. aäbcdefghijklmnoöpqrsßtuüvwxyz\n" );
}

Don't know if it affects anything, but source file is saved as Unicode (UTF-8 with signature) - Codepage 65001 at FILE -> Advanced Save Options ....

Project -> Properties -> Configuration Properties -> General -> Character Set is set to Use Unicode Character Set.

Some say you need to change console font to Lucida Console, but on my side it is displayed with both Consolas and Lucida Console.

查看更多
We Are One
6楼-- · 2019-01-03 11:02

In the console, enter chcp 65001 to change the code page to that of UTF-8.

查看更多
神经病院院长
7楼-- · 2019-01-03 11:05

I've never actually tried setting the console code-page to UTF8 (not sure why it wouldn't work... the console can handle other multi-byte code-pages just fine), but there are a couple of functions to look up: SetConsoleCP and SetConsoleOutputCP.

You'll probably also need to make sure you're using a console font that is capable of displaying your characters. There's the SetCurrentConsoleFontEx function, but it's only available on Vista and above.

Hope that helps.

查看更多
登录 后发表回答