Reading UTF-8 characters using Scanner

2019-07-21 15:00发布

public boolean isValid(String username, String password)  {
        boolean valid = false;
        DataInputStream file = null;

        try{
            Scanner files = new Scanner(new BufferedReader(new FileReader("files/students.txt")));

            while(files.hasNext()){
                System.out.println(files.next());
            }

        }catch(Exception e){
            e.printStackTrace();
        }
        return valid;
    }

How come when I am reading a file that has been written by UTF-8(By another java program) it displays with weird symbols followed by its String name?

I wrote it using this

    private static void  addAccount(String username,String password){
        File file = new File(file_name);
        try{
            DataOutputStream dos = new DataOutputStream(new FileOutputStream(file,true));
            dos.writeUTF((username+"::"+password+"\n"));
        }catch(Exception e){

        }
    } 

标签: java utf-8 io
3条回答
Luminary・发光体
2楼-- · 2019-07-21 15:11

When using DataOutput.writeUTF/DataInput.readUTF, the first 2 bytes form an unsigned 16-bit big-endian integer denoting the size of the string.

First, two bytes are read and used to construct an unsigned 16-bit integer in exactly the manner of the readUnsignedShort method . This integer value is called the UTF length and specifies the number of additional bytes to be read. These bytes are then converted to characters by considering them in groups. The length of each group is computed from the value of the first byte of the group. The byte following a group, if any, is the first byte of the next group.

These are likely the cause for your issues. You'd need to skip the first 2 bytes and then specify your Scanner use UTF-8 to read properly.

That being said, I do not see any reason to use DataOutput/DataInput here. You can merely use FileReader and FileWriter instead. These will use the default system encoding.

查看更多
Emotional °昔
3楼-- · 2019-07-21 15:12

From the FileReader Javadoc:

Convenience class for reading character files. The constructors of this class assume that the default character encoding and the default byte-buffer size are appropriate. To specify these values yourself, construct an InputStreamReader on a FileInputStream.

So perhaps something like new InputStreamReader(new FileInputStream(file), "UTF-8"))

查看更多
一夜七次
4楼-- · 2019-07-21 15:31

Here is a simple way to do that:

File words = new File(path);
Scanner s = new Scanner(words,"utf-8");
查看更多
登录 后发表回答