I want get the encoding from a stream.
1st method - to use the InputStreamReader.
But it always return OS encode.
InputStreamReader reader = new InputStreamReader(new FileInputStream("aa.rar"));
System.out.println(reader.getEncoding());
output:GBK
2nd method - to use the UniversalDetector.
But it always return null.
FileInputStream input = new FileInputStream("aa.rar");
UniversalDetector detector = new UniversalDetector(null);
byte[] buf = new byte[4096];
int nread;
while ((nread = input.read(buf)) > 0 && !detector.isDone()) {
detector.handleData(buf, 0, nread);
}
// (3)
detector.dataEnd();
// (4)
String encoding = detector.getDetectedCharset();
if (encoding != null) {
System.out.println("Detected encoding = " + encoding);
} else {
System.out.println("No encoding detected.");
}
// (5)
detector.reset();
output:null
How can I get the right? :(
Let's resume the situation:
So one needs to know the encoding before reading. You did everything right using first a charset detecting class.
Reading http://code.google.com/p/juniversalchardet/ it should handle UTF-8 and UTF-16. You might use the editor JEdit to verify the encoding, and see whether there is some problem.