Why is ¿ displayed different in Windows vs Linux e

Why is the following displayed different in Linux vs Windows?

System.out.println(new String("¿".getBytes("UTF-8"), "UTF-8"));

in Windows:

in Linux:

Â¿

标签： java utf-8 character-encoding

5条回答

Juvenile、少年°

2楼-- · 2020-03-28 09:36

It's hard to know exactly which bytes your source code contains, or the string which getBytes() is being called on, due to your editor and compiler encodings.

Can you produce a short but complete program containing only ASCII (and the relevant \uxxxx escaping in the string) which still shows the problem?

I suspect the problem may well be with the console output on either Windows or Linux, but it would be good to get a reproducible program first.

0人赞添加讨论(0) 举报

成全新的幸福

3楼-- · 2020-03-28 09:37

Check what encoding your linux terminal has.

For gnome-terminal in ubuntu - go to the "Terminal" menu and select "Set Character Encoding".

For putty, Configuration -> Window -> Translation -> UTF-8 (and if that doesn't work, see this post).

0人赞添加讨论(0) 举报

该账号已被封号

4楼-- · 2020-03-28 09:43

Run this code to help determine if it is a compiler or console issue:

public static void main(String[] args) throws Exception {
    String s = "¿";
    printHex(Charset.defaultCharset(), s);

    Charset utf8 = Charset.forName("UTF-8");
    printHex(utf8, s);
}

public static void printHex(Charset encoding, String s)
        throws UnsupportedEncodingException {
    System.out.print(encoding + "\t" + s + "\t");

    byte[] barr = s.getBytes(encoding);
    for (int i = 0; i < barr.length; i++) {
        int n = barr[i] & 0xFF;
        String hex = Integer.toHexString(n);
        if (hex.length() == 1) {
            System.out.print('0');
        }
        System.out.print(hex);
    }
    System.out.println();
}

If the encoded bytes for UTF-8 are different on each platform (it should be c2bf), it is a compiler issue.

If it is a compiler issue, replace "¿" with "\u00bf".

0人赞添加讨论(0) 举报

\"骚年 ilove

5楼-- · 2020-03-28 09:45

System.out.println() outputs the text in the system default encoding, but the console interprets that output according to its own encoding (or "codepage") setting. On your Windows machine the two encodings seem to match, but on the Linux box the output is apparently in UTF-8 while the console is decoding it as a single-byte encoding like ISO-8859-1. Or maybe, as Jon suggested, the source file is being saved as UTF-8 and javac is reading it as something else, a problem that can be avoided by using Unicode escapes.

When you need to output anything other than ASCII text, your best bet is to write it to a file using an appropriate encoding, then read the file with a text editor--consoles are too limited and too system-dependent. By the way, this bit of code:

new String("¿".getBytes("UTF-8"), "UTF-8")

...has no effect on the output. All that does is encode the contents of the string to a byte array and decode it again, reproducing the original string--an expensive no-op. If you want to output text in a particular encoding, you need to use an OutputStreamWriter, like so:

FileOutputStream fos = new FileOutputStream("out.txt");
OutputStreamWriter osw = new OutputStreamWriter(fos, "UTF-8");

0人赞添加讨论(0) 举报

ら.Afraid

6楼-- · 2020-03-28 09:52

Not sure where the problem is exactly, but it's worth noting that

Â¿ ( 0xc2,0xbf)

is the result of encoding with UTF-8

0xbf,

which is the Unicode codepoint for ¿

So, it looks like in the linux case, the output is not being displayed as utf-8, but as a single-byte string

0人赞添加讨论(0) 举报

Why is ¿ displayed different in Windows vs Linux e

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间