Why is conversion from UTF-8 to ISO-8859-1 not the

I have the following in code to convert from UTF-8 to ISO-8859-1 in a jar file and when I execute this jar in Windows I get one result and in CentOS I get another. Might anyone know why?

public static void main(String[] args) {

  try {

    String x = "Ã„, Ã¤, Ã‰, Ã©, Ã–, Ã¶, Ãœ, Ã¼, ÃŸ, Â«, Â»";

    Charset utf8charset = Charset.forName("UTF-8");
    Charset iso88591charset = Charset.forName("ISO-8859-1");

    ByteBuffer inputBuffer = ByteBuffer.wrap(x.getBytes());
    CharBuffer data = utf8charset.decode(inputBuffer);

    ByteBuffer outputBuffer = iso88591charset.encode(data);
    byte[] outputData = outputBuffer.array();

    String z = new String(outputData);

    System.out.println(z);
  }
  catch(Exception e) {
    System.out.println(e.getMessage());
  }
}

In Windows, java -jar test.jar > test.txt creates a file containing: Ä, ä, É, é, Ö, ö, Ü, ü, ß, «, »

but in CentOS I get: �?, ä, �?, é, �?, ö, �?, ü, �?, «, »

标签： java utf-8 iso-8859-1

3条回答

相关推荐>>

2楼-- · 2019-02-25 19:11

Three possibilities spring to mind:

The encoding you're actually using for your source code may differ by platform
The encoding the compiler expects by default may differ by platform (you can specify it on the command line)
The platform default encoding used when you call x.getBytes() may differ by platform

It's not clear in what way you're trying to convert from UTF-8 to ISO-8859-1 - because your original data is actually just a String. You're treating the results of calling x.getBytes() as if it were UTF-8-encoded data, but it's just whatever the platform default is...

Likewise when you write:

String z = new String(outputData);

... that's using the platform default encoding. Don't do that.

You don't need the byte buffer stuff at all: just encode using text.getBytes(encoding) and decode using new String(data, encoding).

0人赞添加讨论(0) 举报

淡お忘

3楼-- · 2019-02-25 19:21

You should first and foremost get the string in correct internal representation in java before even thinking about output. I.E. it should be that:

z.equals("Ä, ä, É, é, Ö, ö, Ü, ü, ß, «, »") == true

The above can be verified without any output encoding issues, because it simply prints true or false.

In Windows you already achieved this with

ByteBuffer inputBuffer = ByteBuffer.wrap(x.getBytes());
CharBuffer data = utf8charset.decode(inputBuffer);

Because all you need to go from "Ã„, Ã¤, Ã‰, Ã©, Ã–, Ã¶, Ãœ, Ã¼, ÃŸ, Â«, Â»" to "Ä, ä, É, é, Ö, ö, Ü, ü, ß, «, »" is:

ByteBuffer inputBuffer = ByteBuffer.wrap(x.getBytes( windows1252/*explicit windows1252 works on CentOS too*/));
CharBuffer data = utf8charset.decode(inputBuffer);

After this you do something with ISO-8859-1, which is futile because barely half the characters in your original string can be represented in ISO-8859-1 not to mention you are already done as per above. You can delete the code after utf8charset.decode(inputBuffer)

So now your code could look like:

String x = "Ã„, Ã¤, Ã‰, Ã©, Ã–, Ã¶, Ãœ, Ã¼, ÃŸ, Â«, Â»";

Charset windows1252 = Charset.forName("Windows-1252");
Charset utf8charset = Charset.forName("UTF-8");

byte[] bytes = x.getBytes(windows1252);
String z = new String(bytes, utf8charset);

                                //Still wondering why you didn't just have this literal to begin with
                                //Check that the strings are internally equal so you know at least that
                                //the code is working

System.out.println(z.equals( "Ä, ä, É, é, Ö, ö, Ü, ü, ß, «, »")); 
System.out.println(z);

0人赞添加讨论(0) 举报

Lonely孤独者°

4楼-- · 2019-02-25 19:22

These two lines

x.getBytes());

String z = new String(outputData);

are platform and default encoding specific.

This runs as expect on Windows and Linux by avoiding platform specific conversions.

String x = "Ä, ä, É, é, Ö, ö, Ü, ü, ß, «, »";

Charset utf8charset = Charset.forName("UTF-8");
Charset iso88591charset = Charset.forName("ISO-8859-1");

ByteBuffer inputBuffer = ByteBuffer.wrap(x.getBytes(utf8charset));
CharBuffer data = utf8charset.decode(inputBuffer);

ByteBuffer outputBuffer = iso88591charset.encode(data);
byte[] outputData = outputBuffer.array();

String z = new String(outputData, iso88591charset);

System.out.println(z);

prints

Ä, ä, É, é, Ö, ö, Ü, ü, ß, «, »

0人赞添加讨论(0) 举报

Why is conversion from UTF-8 to ISO-8859-1 not the

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间