I am trying to read a text file encoded in Shift-JIS (cp 932) using std::wifstream, and std::getline. The following code works in VS2010 but fails in VS2013:
std::wifstream in;
in.open("data932.txt");
const std::locale locale(".932");
in.imbue(locale);
std::wstring line1, line2;
std::getline(in, line1);
std::getline(in, line2);
const bool good = in.good();
The file contains several lines, where the first line contains just ASCII characters, and the second is Japanese script. Thus, when this snippet runs, line1
should contain the ASCII line, line2
the Japanese script, and good
should be true.
When compiled in VS2010, the result is as expected. But when compiled in VS2013, line1
contains the ASCII line, but line2
is empty, and good
is false.
I debugged into the CRT, (as the source is provided with Visual Studio), and found that an internal function called _Mbrtowc
(in file xmbtowc.c) was modified between the two versions, and the way they use to detect a lead byte of a double byte character was changed, and the one in VS 2013 fails to detect a lead byte, thus fails to decode the byte stream.
Further debugging revealed a point, where a _Cvtvec
object's _Isleadbyte
array is initialized (in the function _Getcvt()
, in file xwctomb.c), and that initialization produces a wrong result. It seems that it always uses code page 1252, which is the default code page on my system, and not 932 which is set for the stream in use. However, I could not decide if it is by design, and I missed some required steps to get a good result, or this is indeed a bug in the CRT for VS2013.
Unfortunately I don't have VS2012 installed, so I could not test on that version.
Any insights on this topic are welcome!
I have found a workaround: if for the creation of the locale I explicitly change the global MBC code page, the locale is initialized correctly, and the lines are read and decoded as expected.