I need to read a string from a sequence of bytes which is UTF-8. The source of these bytes come in in separate read operations, which won't respect character boundaries, so I cannot use System.Text.Encoding.UTF8.GetString. But, the System.Text.Decoder class, as returned by System.Text.Encoding.UTF8.GetDecoder() appears to be designed for this scenario. One of the OUT arguments looks like it should indicate when a character has only been partially read.
The documentation for Convert (at https://msdn.microsoft.com/en-us/library/h6w985hz(v=vs.110).aspx) suggests that the completed value should be false, if either the output ( char[] ) buffer was too small, or not all the bytes could be converted. See Remarks paragraph 4.
However, the completed value appears to be TRUE even when the docs says it should be false, when the bytes of a character have not been completely converted.
I presume I'm doing something wrong (or this is a bug ??), and if so, how can I detect if my byte stream is paused in the middle of a character ?
demonstration code:
const int outSize = 10;
char[] outBuf = new char[outSize];
byte[] frag1 = new byte[] { 0xE7 };
byte[] frag2 = new byte[] { 0x95, 0xA2 };
var decoder = System.Text.Encoding.UTF8.GetDecoder();
int bytesUsed, charsUsed; bool completed;
// the first byte of the UTF-8 character
decoder.Convert(frag1, 0, frag1.Length, outBuf, 0, outSize, false, out bytesUsed, out charsUsed, out completed);
Debug.Assert( bytesUsed == 1 );
Debug.Assert( charsUsed == 0 );
// // // // // // // // // // // // completed is true here, but WHY ?
Debug.Assert( ! completed);
// // // // // // // // // // // //
// the second and third bytes of the UTF-8 character
decoder.Convert(frag2, 0, frag2.Length, outBuf, 0, outSize, false, out bytesUsed, out charsUsed, out completed);
Debug.Assert(bytesUsed == 2);
Debug.Assert(charsUsed == 1);
Debug.Assert(completed);
Debug.Assert( new String(outBuf, 0, 1 ) == "畢" );
Thanks!