I am trying to read a UTF-8 encoded txt file, which has some turkish characters. Basically I am have written an axis based web service, which reads this file and send the output back as a string. Somehow I am not able to read the characters properly. The code is very simple as mentioned here:
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.nio.charset.Charset;
import java.nio.charset.CharsetDecoder;
import java.nio.charset.CodingErrorAction;
public class TurkishWebService {
public String generateTurkishString() throws IOException {
InputStream isr = this.getClass().getResourceAsStream(
"/" + "turkish.txt");
BufferedReader in = new BufferedReader(new InputStreamReader(isr,
"UTF8"));
String str;
while ((str = in.readLine()) != null) {
System.out.println(str);
}
in.close();
return str;
}
public String normalString() {
System.out.println("webService normal text");
return "webService normal text";
}
public static void main(String args[]) throws IOException {
new TurkishWebService().generateTurkishString();
}
}
Here are the contents of turkish.txt, just one line
Assalğçğıİİööşş
I am getting the stdout as
Assal?τ????÷÷??
Please suggest what am I doing wrong here.
You appear to be correctly decoding the file data from UTF-8 to UTF-16 strings.
System.out
performs transcoding operations from UTF-16 strings to the default JRE character encoding. If this does not match the encoding used by the device receiving the character data is corrupted. So, the console should be set to the default character encoding or data corruption occurs. How this is done is device-dependent.If you are using a terminal, the Console does a better job of determining the device encoding.
Note: it is better to use the try-with-resources or at least try-finally to close streams; use the standard encoding constants if available.
Make sure the console you use to display the output is also encoded in UTF-8. In Eclipse for example, you need to go to
Run Configuration
>Common
to do this.Code looks good. The problem should be in console output that cannot print Turkish. To be sure make a temp test in your program: take the string with Assal?τ????÷÷?? that you read from file and do this