Does RandomAccessFile in java read entire file in

2019-01-25 07:55发布

问题:

I need to read last n lines from a large file (say 2GB). The file is UTF-8 encoded.

Would like to know the most efficient way of doing it. Read about RandomAccessFile in java, but does the seek() method , read the entire file in memory. It uses native implementation so i wasn't able to refer the source code.

回答1:

  1. RandomAccessFile.seek just sets the file-pointer current position, no bytes are read into memory.

  2. Since your file is UTF-8 encoded, it is a text file. For reading text files we typically use BufferedReader, Java 7 even added a convinience method File.newBufferedReader to create an instance of a BufferedReader to read text from a file. Though it may be inefficient for reading last n lines, but easy to implement.

  3. To be efficient we need RandomAccessFile and read file backwards starting from the end. Here is a basic example

public static void main(String[] args) throws Exception {
    int n = 3;
    List<String> lines = new ArrayList<>();
    try (RandomAccessFile f = new RandomAccessFile("test", "r")) {
        ByteArrayOutputStream bout = new ByteArrayOutputStream();
        for (long length = f.length(), p = length - 1; p > 0 && lines.size() < n; p--) {
            f.seek(p);
            int b = f.read();
            if (b == 10) {
                if (p < length - 1) {
                    lines.add(0, getLine(bout));
                    bout.reset();
                }
            } else if (b != 13) {
                bout.write(b);
            }
        }
    }
    System.out.println(lines);
}

static String getLine(ByteArrayOutputStream bout) {
    byte[] a = bout.toByteArray();
    // reverse bytes
    for (int i = 0, j = a.length - 1; j > i; i++, j--) {
        byte tmp = a[j];
        a[j] = a[i];
        a[i] = tmp;
    }
    return new String(a);
}

It reads the file byte after byte starting from tail to ByteArrayOutputStream, when LF is reached it reverses the bytes and creates a line.

Two things need to be improved:

  1. buffering

  2. EOL recognition



回答2:

If you need Random Access, you need RandomAccessFile. You can convert the bytes you get from this into UTF-8 if you know what you are doing.

If you use BuffredReader, you can use skip(n) by number of characters which means it has to read the whole file.


A way to do this in combination; is to use FileInputStream with skip(), find where you want to read from by reading back N newlines and then wrap the stream in BufferedReader to read the lines with UTF-8 encoding.