Java : Read last n lines of a HUGE file

2019-01-05 02:23发布

I want to read the last n lines of a very big file without reading the whole file into any buffer/memory area using Java.

I looked around the JDK APIs and Apache Commons I/O and am not able to locate one which is suitable for this purpose.

I was thinking of the way tail or less does it in UNIX. I don't think they load the entire file and then show the last few lines of the file. There should be similar way to do the same in Java too.

10条回答
欢心
2楼-- · 2019-01-05 02:41

A RandomAccessFile allows for seeking (http://download.oracle.com/javase/1.4.2/docs/api/java/io/RandomAccessFile.html). The File.length method will return the size of the file. The problem is determining number of lines. For this, you can seek to the end of the file and read backwards until you have hit the right number of lines.

查看更多
做个烂人
3楼-- · 2019-01-05 02:43

If you use a RandomAccessFile, you can use length and seek to get to a specific point near the end of the file and then read forward from there.

If you find there weren't enough lines, back up from that point and try again. Once you've figured out where the Nth last line begins, you can seek to there and just read-and-print.

An initial best-guess assumption can be made based on your data properties. For example, if it's a text file, it's possible the line lengths won't exceed an average of 132 so, to get the last five lines, start 660 characters before the end. Then, if you were wrong, try again at 1320 (you can even use what you learned from the last 660 characters to adjust that - example: if those 660 characters were just three lines, the next try could be 660 / 3 * 5, plus maybe a bit extra just in case).

查看更多
何必那么认真
4楼-- · 2019-01-05 02:44

I had similar problem, but I don't understood to another solutions.

I used this. I hope thats simple code.

// String filePathName = (direction and file name).
File f = new File(filePathName);
long fileLength = f.length(); // Take size of file [bites].
long fileLength_toRead = 0;
if (fileLength > 2000) {
    // My file content is a table, I know one row has about e.g. 100 bites / characters. 
    // I used 1000 bites before file end to point where start read.
    // If you don't know line length, use @paxdiablo advice.
    fileLength_toRead = fileLength - 1000;
}
try (RandomAccessFile raf = new RandomAccessFile(filePathName, "r")) { // This row manage open and close file.
    raf.seek(fileLength_toRead); // File will begin read at this bite. 
    String rowInFile = raf.readLine(); // First readed line usualy is not whole, I needn't it.
    rowInFile = raf.readLine();
    while (rowInFile != null) {
        // Here I can readed lines (rowInFile) add to String[] array or ArriyList<String>.
        // Later I can work with rows from array - last row is sometimes empty, etc.
        rowInFile = raf.readLine();
    }
}
catch (IOException e) {
    //
}
查看更多
Explosion°爆炸
5楼-- · 2019-01-05 02:44

Here is the best way I've found to do it. Simple and pretty fast and memory efficient.

public static void tail(File src, OutputStream out, int maxLines) throws FileNotFoundException, IOException {
    BufferedReader reader = new BufferedReader(new FileReader(src));
    String[] lines = new String[maxLines];
    int lastNdx = 0;
    for (String line=reader.readLine(); line != null; line=reader.readLine()) {
        if (lastNdx == lines.length) {
            lastNdx = 0;
        }
        lines[lastNdx++] = line;
    }

    OutputStreamWriter writer = new OutputStreamWriter(out);
    for (int ndx=lastNdx; ndx != lastNdx-1; ndx++) {
        if (ndx == lines.length) {
            ndx = 0;
        }
        writer.write(lines[ndx]);
        writer.write("\n");
    }

    writer.flush();
}
查看更多
成全新的幸福
6楼-- · 2019-01-05 02:47
    int n_lines = 1000;
    ReversedLinesFileReader object = new ReversedLinesFileReader(new File(path));
    String result="";
    for(int i=0;i<n_lines;i++){
        String line=object.readLine();
        if(line==null)
            break;
        result+=line;
    }
    return result;
查看更多
干净又极端
7楼-- · 2019-01-05 02:56

RandomAccessFile is a good place to start, as described by the other answers. There is one important caveat though.

If your file is not encoded with an one-byte-per-character encoding, the readLine() method is not going to work for you. And readUTF() won't work in any circumstances. (It reads a string preceded by a character count ...)

Instead, you will need to make sure that you look for end-of-line markers in a way that respects the encoding's character boundaries. For fixed length encodings (e.g. flavors of UTF-16 or UTF-32) you need to extract characters starting from byte positions that are divisible by the character size in bytes. For variable length encodings (e.g. UTF-8), you need to search for a byte that must be the first byte of a character.

In the case of UTF-8, the first byte of a character will be 0xxxxxxx or 110xxxxx or 1110xxxx or 11110xxx. Anything else is either a second / third byte, or an illegal UTF-8 sequence. See The Unicode Standard, Version 5.2, Chapter 3.9, Table 3-7. This means, as the comment discussion points out, that any 0x0A and 0x0D bytes in a properly encoded UTF-8 stream will represent a LF or CR character. Thus, simply counting the 0x0A and 0x0D bytes is a valid implementation strategy (for UTF-8) if we can assume that the other kinds of Unicode line separator (0x2028, 0x2029 and 0x0085) are not used. You can't assume that, then the code would be more complicated.

Having identified a proper character boundary, you can then just call new String(...) passing the byte array, offset, count and encoding, and then repeatedly call String.lastIndexOf(...) to count end-of-lines.

查看更多
登录 后发表回答