How to read Cyrillic Unicode file in C++?

2019-04-30 19:42发布

I'm trying to read lines from .txt files, that have been saved as Unicode. That's how i'm doing it:

wifstream input;
string path = "test.txt";
input.imbue(locale(input.getloc(),
        new codecvt_utf16<wchar_t, 0x10ffff, consume_header>));

input.open(path);
if (input.is_open())
{
    wstring line;
    input.seekg( 1 , ios_base::beg);
    getline(input, line);
}

It works fine for files with Latin characters. But for Cyrillic files I get weird symbols instead of spaces and adjacent characters.

For example:

What is in the input file:

Госдеп США осудил нападение на

What I get:

︓осдепР!ШАР>судилР=ападениеР=а

What am I doing wrong?

标签： c++ unicode locale getline cyrillic

2条回答

甜甜的少女心

2楼-- · 2019-04-30 20:11

Well, figured out the way:

FILE *input= _wfopen(L"test.txt", L"rb");
wchar_t line[1000];
test.txtfgetws(line, 1000, input);

Works fine like that. Was quite stupid of me not to try it first. So thanks everyone.

0人赞添加讨论(0) 举报

何必那么认真

3楼-- · 2019-04-30 20:32

one line looks very suspicous in your code:

input.seekg(1, ios_base::beg);

it sets file position, so reading utf16 string starting position 1 might be incorrect (BOM is read incorrectly). i have the same result for utf16 file in little endian.

so you might change position to 0 or delete this line in order to make this code work

0人赞添加讨论(0) 举报

How to read Cyrillic Unicode file in C++?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间