I'm writing a program to get the infomation in *.rc file encoding in UCS-2 Little Endian.
int _tmain(int argc, _TCHAR* argv[]) {
wstring csvLine(wstring sLine);
wifstream fin("en.rc");
wofstream fout("table.csv");
wofstream fout_rm("temp.txt");
wstring sLine;
fout << "en\n";
while(getline(fin,sLine)) {
if (sLine.find(L"IDS") == -1)
fout_rm << sLine << endl;
else
fout << csvLine(sLine);
}
fout << flush;
system("pause");
return 0;
}
The first line in "en.rc" is #include <windows.h>
but sLine
shows as below:
[0] 255 L'ÿ'
[1] 254 L'þ'
[2] 35 L'#'
[3] 0
[4] 105 L'i'
[5] 0
[6] 110 L'n'
[7] 0
[8] 99 L'c'
. .
. .
. .
This program can work out correctly for UTF-8. How can I do it to UCS-2?
Wide streams use a wide stream buffer to access the file. The Wide stream buffer reads bytes from the file and uses its codecvt facet to convert these bytes to wide characters. The default codecvt facet is
std::codecvt<wchar_t, char ,std::mbstate_t>
which converts between the native character sets forwchar_t
andchar
(i.e., likembstowcs(
) does).You're not using the native char character set, so what you want is a codecvt facet that reads
UCS-2
as a multibyte sequence and converts it to wide characters.Note that there's an issue with
UTF-16
here. The purpose ofwchar_t
is for onewchar_t
to represent one codepoint. However Windows usesUTF-16
which represents some codepoints as twowchar_t
s. This means that the standard API doesn't work very well with Windows.The consequence here is that when the file contains a surrogate pair,
codecvt_utf16
will read that pair, convert it to a single codepoint value greater than 16 bits and have to truncate the value to 16 bits to stick it in awchar_t
. This means this code really is limited toUCS-2
. I've set the maxcode template parameter to0xFFFF
to reflect this.There are a number of other problems with
wchar_t
, and you might want to just avoid it entirely: What's “wrong” with C++ wchar_t?