Converting UTF-8 to ISO-8859-1 in Java - how to ke

I am trying to convert a string encoded in java in UTF-8 to ISO-8859-1. Say for example, in the string 'âabcd' 'â' is represented in ISO-8859-1 as E2. In UTF-8 it is represented as two bytes. C3 A2 I believe. When I do a getbytes(encoding) and then create a new string with the bytes in ISO-8859-1 encoding, I get a two different chars. Ã¢. Is there any other way to do this so as to keep the character the same i.e. âabcd?

标签： java utf-8 iso-8859-1

8条回答

爷的心禁止访问

2楼-- · 2019-01-03 05:38

This is what I needed:

public static byte[] encode(byte[] arr, String fromCharsetName) {
    return encode(arr, Charset.forName(fromCharsetName), Charset.forName("UTF-8"));
}

public static byte[] encode(byte[] arr, String fromCharsetName, String targetCharsetName) {
    return encode(arr, Charset.forName(fromCharsetName), Charset.forName(targetCharsetName));
}

public static byte[] encode(byte[] arr, Charset sourceCharset, Charset targetCharset) {

    ByteBuffer inputBuffer = ByteBuffer.wrap( arr );

    CharBuffer data = sourceCharset.decode(inputBuffer);

    ByteBuffer outputBuffer = targetCharset.encode(data);
    byte[] outputData = outputBuffer.array();

    return outputData;
}

0人赞添加讨论(0) 举报

戒情不戒烟

3楼-- · 2019-01-03 05:42

evict non ISO-8859-1 characters, will be replace by '?' (before send to a ISO-8859-1 DB by example):

utf8String = new String ( utf8String.getBytes(), "ISO-8859-1" );

0人赞添加讨论(0) 举报

贪生不怕死

4楼-- · 2019-01-03 05:43

byte[] iso88591Data = theString.getBytes("ISO-8859-1");

Will do the trick. From your description it seems as if you're trying to "store an ISO-8859-1 String". String objects in Java are always implicitely encoded in UTF-16. There's no way to change that encoding.

What you can do, 'though is to get the bytes that constitute some other encoding of it (using the .getBytes() method as shown above).

0人赞添加讨论(0) 举报

聊天终结者

5楼-- · 2019-01-03 05:44

If you're dealing with character encodings other than UTF-16, you shouldn't be using java.lang.String or the char primitive -- you should only be using byte[] arrays or ByteBuffer objects. Then, you can use java.nio.charset.Charset to convert between encodings:

Charset utf8charset = Charset.forName("UTF-8");
Charset iso88591charset = Charset.forName("ISO-8859-1");

ByteBuffer inputBuffer = ByteBuffer.wrap(new byte[]{(byte)0xC3, (byte)0xA2});

// decode UTF-8
CharBuffer data = utf8charset.decode(inputBuffer);

// encode ISO-8559-1
ByteBuffer outputBuffer = iso88591charset.encode(data);
byte[] outputData = outputBuffer.array();

0人赞添加讨论(0) 举报

可以哭但决不认输i

6楼-- · 2019-01-03 05:46

If you have the correct encoding in the string, you need not do more to get the bytes for another encoding.

public static void main(String[] args) throws Exception {
    printBytes("â");
    System.out.println(
            new String(new byte[] { (byte) 0xE2 }, "ISO-8859-1"));
    System.out.println(
            new String(new byte[] { (byte) 0xC3, (byte) 0xA2 }, "UTF-8"));
}

private static void printBytes(String str) {
    System.out.println("Bytes in " + str + " with ISO-8859-1");
    for (byte b : str.getBytes(StandardCharsets.ISO_8859_1)) {
        System.out.printf("%3X", b);
    }
    System.out.println();
    System.out.println("Bytes in " + str + " with UTF-8");
    for (byte b : str.getBytes(StandardCharsets.UTF_8)) {
        System.out.printf("%3X", b);
    }
    System.out.println();
}

Output:

Bytes in â with ISO-8859-1
 E2
Bytes in â with UTF-8
 C3 A2
â
â

0人赞添加讨论(0) 举报

The star\"

7楼-- · 2019-01-03 05:49

Starting with a set of bytes which encode a string using UTF-8, creates a string from that data, then get some bytes encoding the string in a different encoding:

    byte[] utf8bytes = { (byte)0xc3, (byte)0xa2, 0x61, 0x62, 0x63, 0x64 };
    Charset utf8charset = Charset.forName("UTF-8");
    Charset iso88591charset = Charset.forName("ISO-8859-1");

    String string = new String ( utf8bytes, utf8charset );

    System.out.println(string);

    // "When I do a getbytes(encoding) and "
    byte[] iso88591bytes = string.getBytes(iso88591charset);

    for ( byte b : iso88591bytes )
        System.out.printf("%02x ", b);

    System.out.println();

    // "then create a new string with the bytes in ISO-8859-1 encoding"
    String string2 = new String ( iso88591bytes, iso88591charset );

    // "I get a two different chars"
    System.out.println(string2);

this outputs strings and the iso88591 bytes correctly:

âabcd 
e2 61 62 63 64 
âabcd

So your byte array wasn't paired with the correct encoding:

    String failString = new String ( utf8bytes, iso88591charset );

    System.out.println(failString);

Outputs

Ã¢abcd

(either that, or you just wrote the utf8 bytes to a file and read them elsewhere as iso88591)

0人赞添加讨论(0) 举报

1 2 下一页

Converting UTF-8 to ISO-8859-1 in Java - how to ke

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间