[Note: question basically re-edited after a lot of playing around]
In Java, you have Charset
, defining a character encoding. From a Charset
, you can obtain two objects:
- a
CharsetEncoder
, to turn achar
sequence into abyte
sequence; - a
CharsetDecoder
, to turn abyte
sequence into achar
sequence.
Both of these classes have the following methods defined: .onUnmappableCharacter()
and .onMalformedInput()
. If you tell them for each of these to CodingErrorAction.REPORT
they will throw either of these two exceptions: UnmappableCharacterException
and MalformedInputException
.
With a CharsetEncoder
, I am able to generate both of them:
- feed it with a
CharBuffer
containing two high surrogates following one another -->MalformedInputException
; - feed it with a
CharBuffer
containing achar
(orchar
sequence) which the encoding cannot represent:UnmappableCharacterException
.
With a CharsetDecoder
:
- feed it with an illegal byte sequence:
MalformedInputException
-- easy to do; UnmappableCharacterException
--> how?
In spite of all my research, I just couldn't do it.
All of this in spite of having played a lot with CharsetDecoder
. I could find no combination of Charset
and byte sequence able to generare this error...
Is there any at all?
When you supply a character to the decoder, the decoder can tell that a character is not appropriate for the charset and throw a UnmappableCharacterException.
When you supply a byte array to the encoder, is assumes that it has been encoded properly. Thus when it decodes your byte array and gets a bad character, it assumes you have a broken encoder or bad input, which causes a MalformedInputException.
It's just a matter of finding a character set with an unmappable byte sequence.
Take, for example,
IBM1098
. It can't map the hex bytesSo put these in a
ByteBuffer
, rewind it, and try to decode it.This throws
Ideone.com attempt.