if i print unicode String like ελληνικά on the console using the print
method of System.out
stream, its printed as expected (As i use Ubuntu mono in my output console which supports UTF characters).
But if i try to read from the console unicode characters with UTF-8 encoding using System.in stream, it doesn't read properly.
I have tried many different ways to achieve it using various reader classes with the System.in stream but it never works. So does anyone know a way i could do that
Here is a sample of code
BufferedReader keyboard = new BufferedReader(new InputStreamReader(System.in, "UTF-8"));
BufferedWriter console = new BufferedWriter(new OutputStreamWriter(System.out, "UTF-8"));
console.write("p1: Γίνεται πάντως\n");
console.flush();
System.out.println("p2: Γίνεται πάντως");
byte dataBytes[] = keyboard.readLine().getBytes(Charset.forName("UTF-8"));
System.out.println("p3: " + new String(dataBytes));
console.write("p4: " + new String(dataBytes, "UTF-8") + "\n");
console.flush();
Scanner scan = new Scanner(System.in, "UTF-8");
System.out.println("p5: " + (char) System.in.read());
System.out.println("p6: " + scan.nextLine());
System.out.println("p7: " + keyboard.readLine());
and the output on my console:
p1: Γίνεται πάντως
p2: Γίνεται πάντως
Δέν
p3: ���
p4: ���
Δέν
p5: Ä
p6: ��
Δέν
p7: ���
my IDE is Netbeans
System.in
is an InputStream
, which is a stream of bytes. You need a Reader
to read characters. The reader is going to do the decoding for you.
In this case, you can wrap System.in
with a InputStreamReader
, passing "UTF-8" as the second constructor parameter.
Scanner console = new Scanner(new InputStreamReader(System.in, "UTF-8"));
while (console.hasNextLine())
System.out.println(console.nextLine());
Update:
It's likely the encoding of your stdin is wrong. To verify, you can compare the byte array you get from System.in
and the expected.
byte [] expected = "Δέν".getBytes("UTF-8"); // [-50, -108, -50, -83, -50, -67]
byte [] fromStdin = new byte[1024];
int c = System.in.read(fromStdin);
for (int i = 0; i < c-1; i++) {
if (expected[i] != fromStdin[i]) {
System.out.println(i + ", " + fromStdin[i]);
}
}
And you input "Δέν" (without double quotes) then hit enter. If it outputs anything, your System.in is in wrong encoding.
Shouldn't System.in
have the same encoding as defaultCharset
or some system property?
Not necessarily. It's a byte stream, not a character stream. It cannot be a character stream, because you can/should be able to feed it binary data. An image or audio or vedio, whatever you want. It must support those. That's why it's just an InputStream
. It depends on what the environment gave your program. And I know very little about your environment. You need to find out how to change your environment, or figure out what encoding it's actually giving your program.
For example we have an UTF-16
text file utf16.txt
, and we feed its content to our program who expects the STDIN to be UTF-8
encoded text:
java -cp ... our.utf8.Program < utf16.txt
It's going to read gibberish.
Try using java.io.Console.readLine()
or java.io.Console.readLine(String, Object...)
. Console
instance is returned by System.console()
method. For example:
package package01;
import java.io.Console;
public class Example {
public static void main(String[] args) {
Console console = System.console();
if (console == null) {
System.err.println("No console");
System.exit(1);
}
String s = console.readLine("Enter string: ");
System.out.println(s);
}
}