Printing out unicode from Java code issue in windo

2020-01-29 02:19发布

问题:

I have got a problem with printing out a unicode symbol in the windows console.

Here's the java code that prints out the unicode symbol value;

System.out.print("\u22A2 ");

The problem doesn't exist when I run the program in Eclipse with encoding settings as UTF-8, however when it comes to windows console the symbol gets replaced by a question mark.

The following was done to try overcome this problem, with no success;

  • Change the font of windows console to Lucida Console.

  • Every time I run windows console I will change the encoding settings, i.e. with the use of chcp 65001

An extra step I've tried a few times was running the java file with an argument, i.e. java -Dfile.encoding=UTF-8 Filter (where "Filter" is name of the class)

回答1:

By default, the code-page using in the CMD of Windows is 437. You can test by run this command in the prompt:

C:\>chcp
Active code page: 437

And, this code-page prevent you from showing Unicode characters properly! You have to change code page to 65001 AND using -Dfile.encoding=UTF-8 for that purpose.

C:\>chcp 65001
Active code page: 65001
C:\>java -jar -Dfile.encoding=UTF-8 path/to/your/runnable/jar


回答2:

In additions to the steps you have taken, you also need a PrintStream/PrintWriter that encodes the printed characters to UTF-8.

Unfortunately, Java designers have chosen to open the standard streams with the so called "default" encoding, which is almost always unusable*) under Windows. Hence, using System.out and System.err naively will make your program output appear differently, depending on where you run it. This is straight against the goal: compile once, run anywhere.

*) It will be some non standard "code page" nobody except Microsoft recognizes on this planet. And AFAIK, if for example you have a German keyboard and a "German" OEM Windows and you want to have date and time in your home time zone, there is just no way to say: But I want UTF-8 input/output in my CMD window. This is one reason why I have my dual Ubuntu booted most of the time, where it goes without saying that the terminal does UTF-8.

The following usually works for me in JDK7:

public static PrintWriter stdout = new PrintWriter(
    new OutputStreamWriter(System.out, StandardCharsets.UTF_8),
    true);

For ancient Java versions, I replace StandardCharsets.UTF_8 by Charset.forName("UTF-8")



回答3:

For the Arabic language I used the following code:

PrintWriter stdout = new PrintWriter(
new OutputStreamWriter(System.out,StandardCharsets.ISO_8859_1),true);