Opening InputStreamReader in the middle of UTF-8 s

2019-08-12 20:44发布

I am using a seekable InputStream which returns the stream to me at a specific position. The underlying data in the stream is encoded with UTF-8. I want to open this stream using inputStreamReader and read one character at a time.

Here is my code snippet

inputStream.seek(position-1);
InputStreamReader reader = new InputStreamReader(inputStream, "UTF-8");

The problem is that if position-1 could be pointing to the middle of a multi-byte UTF-8 sequence. How can I detect that make sure it starts from a new UTF-8 encoded sequence? Thanks in advance.

1条回答
劫难
2楼-- · 2019-08-12 21:22

Assuming you can reposition the stream whenever you want, you can simply read bytes while the top two bits are "10". So something like:

// InputStream doesn't actually have a seek method, but I'll assume you're using
// a subclass which does...
inputStream.seek(position);
while (true) {
    int nextByte = inputStream.read();
    if (nextByte == -1 || (nextByte & 0xc0) != 0xc0) {
       break;
    }
    position++;
}
// Undo the last read, effectively
inputStream.seek(position);
InputStreamReader reader = new InputStreamReader(inputStream, StandardCharsets.UTF_8);
查看更多
登录 后发表回答