I need to read last n lines from a large file (say 2GB). The file is UTF-8 encoded.
Would like to know the most efficient way of doing it. Read about RandomAccessFile in java, but does the seek() method , read the entire file in memory. It uses native implementation so i wasn't able to refer the source code.
RandomAccessFile.seek just sets the file-pointer current position, no bytes are read into memory.
Since your file is UTF-8 encoded, it is a text file. For reading text files we typically use BufferedReader, Java 7 even added a convinience method File.newBufferedReader to create an instance of a BufferedReader to read text from a file. Though it may be inefficient for reading last n lines, but easy to implement.
To be efficient we need RandomAccessFile and read file backwards starting from the end. Here is a basic example
public static void main(String[] args) throws Exception {
int n = 3;
List<String> lines = new ArrayList<>();
try (RandomAccessFile f = new RandomAccessFile("test", "r")) {
ByteArrayOutputStream bout = new ByteArrayOutputStream();
for (long length = f.length(), p = length - 1; p > 0 && lines.size() < n; p--) {
f.seek(p);
int b = f.read();
if (b == 10) {
if (p < length - 1) {
lines.add(0, getLine(bout));
bout.reset();
}
} else if (b != 13) {
bout.write(b);
}
}
}
System.out.println(lines);
}
static String getLine(ByteArrayOutputStream bout) {
byte[] a = bout.toByteArray();
// reverse bytes
for (int i = 0, j = a.length - 1; j > i; i++, j--) {
byte tmp = a[j];
a[j] = a[i];
a[i] = tmp;
}
return new String(a);
}
It reads the file byte after byte starting from tail to ByteArrayOutputStream, when LF is reached it reverses the bytes and creates a line.
Two things need to be improved:
buffering
EOL recognition
If you need Random Access, you need RandomAccessFile. You can convert the bytes you get from this into UTF-8 if you know what you are doing.
If you use BuffredReader, you can use skip(n) by number of characters which means it has to read the whole file.
A way to do this in combination; is to use FileInputStream with skip(), find where you want to read from by reading back N newlines and then wrap the stream in BufferedReader to read the lines with UTF-8 encoding.