I am trying to convert a string encoded in java in UTF-8 to ISO-8859-1. Say for example, in the string 'âabcd' 'â' is represented in ISO-8859-1 as E2. In UTF-8 it is represented as two bytes. C3 A2 I believe. When I do a getbytes(encoding) and then create a new string with the bytes in ISO-8859-1 encoding, I get a two different chars. â. Is there any other way to do this so as to keep the character the same i.e. âabcd?
相关问题
- Delete Messages from a Topic in Apache Kafka
- Jackson Deserialization not calling deserialize on
- How to maintain order of key-value in DataFrame sa
- StackExchange API - Deserialize Date in JSON Respo
- Difference between Types.INTEGER and Types.NULL in
This is what I needed:
evict non ISO-8859-1 characters, will be replace by '?' (before send to a ISO-8859-1 DB by example):
utf8String = new String ( utf8String.getBytes(), "ISO-8859-1" );
Will do the trick. From your description it seems as if you're trying to "store an ISO-8859-1 String". String objects in Java are always implicitely encoded in UTF-16. There's no way to change that encoding.
What you can do, 'though is to get the bytes that constitute some other encoding of it (using the .getBytes() method as shown above).
If you're dealing with character encodings other than UTF-16, you shouldn't be using
java.lang.String
or thechar
primitive -- you should only be usingbyte[]
arrays orByteBuffer
objects. Then, you can usejava.nio.charset.Charset
to convert between encodings:If you have the correct encoding in the string, you need not do more to get the bytes for another encoding.
Output:
Starting with a set of bytes which encode a string using UTF-8, creates a string from that data, then get some bytes encoding the string in a different encoding:
this outputs strings and the iso88591 bytes correctly:
So your byte array wasn't paired with the correct encoding:
Outputs
(either that, or you just wrote the utf8 bytes to a file and read them elsewhere as iso88591)