Cygwin encoding difficulties

2019-02-26 13:11发布

问题:

Not sure whether this is a programming problem. I began to suspect so... but then I ran the Java program (executable jar) in question in a Windows console instead of a Cygwin one... and it ran fine: output accents fine, accented input accepted fine. So what follows applies only to the Cygwin console.

I'm processing some French text. When accented characters are printed (System.out) a sort of "hashed box" is printed instead. I saw another question here about this but there was no solution or proper explanation given.

And when I enter accented characters these are read in incorrectly (Java System.in), e.g. "bénéfice" is then printed out (in the log which is handling encoding correctly) as "bénéfice".

What is puzzling (perhaps) is that I am able to type "bénéfice" in the console. The font Deja Vu Sans Mono is meant to handle Unicode well, as I understand it. So... might this be something to do with the Java System.in and System.out streams???

For the avoidance of doubt, this is Cygwin on a Windows platform (does anyone use Cygwin on a non-Windows OS?).

I have tried changing the "Locale" and Character set and Font, by going Options --> Text. Nothing changes these boxes. At the moment settings are the default ones:
Font: Deja Vu Sans Mono
Locale: en_GB
Character set: UTF-8

At the command prompt, when I go

$ locale

I get

LANG=en_GB.UTF-8
LC_CTYPE="en_GB.UTF-8"
LC_NUMERIC="en_GB.UTF-8"
LC_TIME="en_GB.UTF-8"
LC_COLLATE="en_GB.UTF-8"
LC_MONETARY="en_GB.UTF-8"
LC_MESSAGES="en_GB.UTF-8"
LC_ALL=

Anyone know what I should do?

回答1:

Thanks to Paul and Zhong Yu for the answers here.

To print to Cygwin do this sort of thing:

PrintStream out = new PrintStream(System.out, true, "UTF-8");
out.print( outputString );

To read from Cygwin do this sort of thing:

BufferedReader br = new BufferedReader( new InputStreamReader(System.in, "UTF-8") );
String nextInputLine = br.readLine();

Slightly amazed that this question has not come up before re Cygwin.