What's the quickest and most efficient way of reading the last line of text from a [very, very large] file in Java?
相关问题
- Delete Messages from a Topic in Apache Kafka
- Jackson Deserialization not calling deserialize on
- How to maintain order of key-value in DataFrame sa
- StackExchange API - Deserialize Date in JSON Respo
- Difference between Types.INTEGER and Types.NULL in
Below are two functions, one that returns the last non-blank line of a file without loading or stepping through the entire file, and the other that returns the last N lines of the file without stepping through the entire file:
What tail does is zoom straight to the last character of the file, then steps backward, character by character, recording what it sees until it finds a line break. Once it finds a line break, it breaks out of the loop. Reverses what was recorded and throws it into a string and returns. 0xA is the new line and 0xD is the carriage return.
If your line endings are
\r\n
orcrlf
or some other "double newline style newline", then you will have to specify n*2 lines to get the last n lines because it counts 2 lines for every line.But you probably don't want the last line, you want the last N lines, so use this instead:
Invoke the above methods like this:
Warning In the wild west of unicode this code can cause the output of this function to come out wrong. For example "Mary?s" instead of "Mary's". Characters with hats, accents, Chinese characters etc may cause the output to be wrong because accents are added as modifiers after the character. Reversing compound characters changes the nature of the identity of the character on reversal. You will have to do full battery of tests on all languages you plan to use this with.
For more information about this unicode reversal problem read this: http://msmvps.com/blogs/jon_skeet/archive/2009/11/02/omg-ponies-aka-humanity-epic-fail.aspx
You can easily change the below code to print the last line.
MemoryMappedFile for printing last 5 lines:
RandomAccessFile to print last 5 lines:
as far as I know The fastest way to read the last line of a text file is using FileUtils Apache class which is in "org.apache.commons.io". I have a two-million-line file and by using this class, it took me less than one second to find the last line. Here is the my code:
Using FileReader or FileInputStream won't work - you'll have to use either FileChannel or RandomAccessFile to loop through the file backwards from the end. Encodings will be a problem though, as Jon said.
Have a look at my answer to a similar question for C#. The code would be quite similar, although the encoding support is somewhat different in Java.
Basically it's not a terribly easy thing to do in general. As MSalter points out, UTF-8 does make it easy to spot
\r
or\n
as the UTF-8 representation of those characters is just the same as ASCII, and those bytes won't occur in multi-byte character.So basically, take a buffer of (say) 2K, and progressively read backwards (skip to 2K before you were before, read the next 2K) checking for a line termination. Then skip to exactly the right place in the stream, create an
InputStreamReader
on the top, and aBufferedReader
on top of that. Then just callBufferedReader.readLine()
.In C#, you should be able to set the stream's position:
From: http://bytes.com/groups/net-c/269090-streamreader-read-last-line-text-file