fwprintf omits wide chars

2019-07-11 02:13发布

I'm trying to create wide chars file using MinGW C on Windows, however wide chars seem to be omitted. My code:

const wchar_t* str = L"příšerně žluťoučký kůň úpěl ďábelské ódy";
FILE* fd = fopen("file.txt","w");
// FILE* fd = _wfopen(L"demo.txgs",L"w"); // attempt to open wide file doesn't help
fwide(fd,1); // attempt to force wide mode, doesn't help
fwprintf(fd,L"%ls",str);
// fputws(p,fd); // stops output after writing "p" (1B file size)
fclose(fd);

File contents

píern luouký k úpl ábelské ódy

The file size is 30B, so the wide chars are really missing. How to convince the compiler to write them?

As @chqrlie suggests in the comments: the result of

fwrite(str, 1, sizeof(L"příšerně žluťoučký kůň úpěl ďábelské ódy"), fd);

is 82 (I guess 2*30 + 2*10 (ommited chars) + 2 (wide trailing zero)).

It also might be useful to quote from here

The external representation of wide characters in files are multibyte characters: These are obtained as if wcrtomb was called to convert each wide character (using the stream's internal mbstate_t object).

Which explains why the ISO-8859-1 chars are single byte in the file, but I don't know how to use this information to solve my problem. Doing the opposite task (reading multibyte UTF-8 into wide chars) I failed to use mbtowc and ended up using winAPI's MultiByteToWideChar.

2条回答
Summer. ? 凉城
2楼-- · 2019-07-11 02:51

I figured this out. The internal use of wcrtomb (mentioned in details of my question) needs setlocale call, but that call fails with UTF-8 on Windows. So I used winAPI here:

char output[100]; // not wchar_t, write byte-by-byte
int len = WideCharToMultiByte(CP_UTF8,0,str,-1,NULL,0,NULL,NULL);
if(len>100) len = 100;
WideCharToMultiByte(CP_UTF8,0,str,-1,output,len,NULL,NULL);
fputs(output,fd);

And voila! The file is 56B long with expected UTF-8 contents:

příšerně žluťoučký kůň úpěl ďábelské ódy

I hope this will save some nerves to Windows coders.

查看更多
一纸荒年 Trace。
3楼-- · 2019-07-11 02:53

I am not a Windows user, but you might try this:

const wchar_t *str = L"příšerně žluťoučký kůň úpěl ďábelské ódy";
FILE *fd = fopen("file.txt", "w,ccs=UTF-8");
fwprintf(fd, L"%ls", str);
fclose(fd);

I got this idea from this question: How do I write a UTF-8 encoded string to a file in windows, in C++

查看更多
登录 后发表回答