Java FileReader encoding issue

2019-01-03 04:53发布

I tried to use java.io.FileReader to read some text files and convert them into a string, but I found the result is wrong encoded and not readable at all.

Here's my environment:

  • Windows 2003, OS encoding: CP1252

  • Java 5.0

My files are UTF-8 encoded or CP1252 encoded, and some of them (UTF-8 encoded files) may contain Chinese (non-Latin) characters.

I use the following code to do my work:

   private static String readFileAsString(String filePath)
    throws java.io.IOException{
        StringBuffer fileData = new StringBuffer(1000);
        FileReader reader = new FileReader(filePath);
        //System.out.println(reader.getEncoding());
        BufferedReader reader = new BufferedReader(reader);
        char[] buf = new char[1024];
        int numRead=0;
        while((numRead=reader.read(buf)) != -1){
            String readData = String.valueOf(buf, 0, numRead);
            fileData.append(readData);
            buf = new char[1024];
        }
        reader.close();
        return fileData.toString();
    }

The above code doesn't work. I found the FileReader's encoding is CP1252 even if the text is UTF-8 encoded. But the JavaDoc of java.io.FileReader says that:

The constructors of this class assume that the default character encoding and the default byte-buffer size are appropriate.

Does this mean that I am not required to set character encoding by myself if I am using FileReader? But I did get wrong encoded data currently, what's the correct way to deal with my situtaion? Thanks.

3条回答
神经病院院长
2楼-- · 2019-01-03 05:34

Since Java 11 you may use that:

public FileReader(String fileName, Charset charset) throws IOException;
查看更多
Animai°情兽
3楼-- · 2019-01-03 05:41

FileReader uses Java's platform default encoding, which depends on the system settings of the computer it's running on and is generally the most popular encoding among users in that locale.

If this "best guess" is not correct then you have to specify the encoding explicitly. Unfortunately, FileReader does not allow this (major oversight in the API). Instead, you have to use new InputStreamReader(new FileInputStream(filePath), encoding) and ideally get the encoding from metadata about the file.

查看更多
家丑人穷心不美
4楼-- · 2019-01-03 05:45

Yes, you need to specify the encoding of the file you want to read.

Yes, this means that you have to know the encoding of the file you want to read.

No, there is no general way to guess the encoding of any given "plain text" file.

The constructors of FileReader always use the platform default encoding which is generally a bad idea.

Instead of FileReader you need to use new InputStreamReader(new FileInputStream(pathToFile), <encoding>).

查看更多
登录 后发表回答