I'd like to transcode character encoding on-the-fly. I'd like to use iostreams and my own transcoding streambuf
, e.g.:
xcoder_streambuf xbuf( "UTF-8", "ISO-8859-1", cout.rdbuf() );
cout.rdbuf( &xbuf );
char *utf8_s; // pointer to buffer containing UTF-8 encoded characters
// ...
cout << utf8_s; // characters are written in ISO-8859-1
The implementation of xcoder_streambuf
would use ICU's converters API. It would take the data coming in (in this case, from utf8_s
), transcode it, and write it out using the iostream's original steambuf
.
Is that a reasonable way to go? If not, what would be better?
Yes, but it is not the way you are expected to do it in modern (as in 1997) iostream.
The behaviour of outputting through
basic_streambuf<>
is defined by theoverflow(int_type c)
virtual function.The description of
basic_filebuf<>::overflow(int_type c = traits::eof())
includesa_codecvt.out(state, b, p, end, xbuf, xbuf+XSIZE, xbuf_end);
wherea_codecvt
is defined as:so you are expected to
imbue
alocale
with the appropriatecodecvt<charT,char,typename traits::state_type>
converter.The standard library support for Unicode made some progress since 1997:
This seems what you want (ISO-8859-1 codes are USC-4 codes = UTF-32).
I would introduce a different type for UTF8, like:
This way you cannot accidentally pass UTF8 where ISO-8859-* is expected. But then you would have to write some interface code, and the type of your streams won't be
istream
/ostream
.Disclaimer: I never actually did such a thing, so I don't know if it is workable in practice.