unicode “aware” std::getline

Well I'm testing how to write a C++ application that actually can read (and change) text files while respecting the encoding used for the text. I wish (for other API's) to explicitly convert all read text to UTF-8 for internal use. Independent of what the actual encoding in the file was.

I am on windows and testing a textfile encoded using "ansi" "UTF-8" (those seem to work correctly). And then "unicode big endian" doesn't work; the std::getline result seems to be the raw byte array, no conversion of the file (UTF-16??) to UTF-8.

How can I force this? I do not know beforehand what the file is encoded with. Code used:

std::string retString;
if (isValidIndex(file_index) && OpenFilestreams()[file_index]->good()) {
    std::getline(*OpenFilestreams()[file_index], retString);
}
return retString;

Where file is OpenFilestreams() "is" a vector (static one containing all opened files), and file_index an index in the vector. So how to make sure here that it reads using the correct encoding?

As for the use:

Actually trying to convert it to a wstring using:

std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> converter;
return converter.from_bytes(input.c_str());

Gives the a std::range_error exception. (I need wstring for other windows api functions).

标签： c++ unicode encoding

1条回答

SAY GOODBYE

2楼-- · 2019-06-09 22:45

There is no way that std::getline can get the encoding of the file. You can uses std::locale to change the encoding used.

Some Unicode files contain BOM (which state the encoding used), by this is not required.

Normally the text applications if the BOM is present use that encoding and if not try to make heuristics for identify the encoding used and read the text with that encoding, normalize the text (ex: UTF8), assume in the rest of the app the text is in UTF8, and saved in the same encoding that was read.

Some info about Unicode Joel Spolsky Unicode Article
Other Article about Reading Unicode Encodings in C++

0人赞添加讨论(0) 举报

unicode “aware” std::getline

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间