For log processing my application needs to read text files line by line.
First I used the function readLine() of BufferedReader but I read on the internet that BufferedReader is slow when reading files.
Afterwards I tried to use FileInputStream together with a FileChannel and MappedByteBuffer but in this case there's no function similar to readLine() so I search my text for a line-break and process it:
try {
FileInputStream f = new FileInputStream(file);
FileChannel ch = f.getChannel( );
MappedByteBuffer mb = ch.map(FileChannel.MapMode.READ_ONLY, 0L, ch.size());
byte[] bytes = new byte[1024];
int i = 0;
while (mb.hasRemaining()) {
byte get = mb.get();
if(get == '\n') {
if(ra.run(new String(bytes)))
cnt++;
for(int j = 0; j<=i; j++)
bytes[j] = 0;
i = 0;
}
else
bytes[i++] = get;
}
} catch(Exception ex) {
ex.printStackTrace();
}
I know this is probably not a good way to implement it but when I just read the text-file in bytes it is 3 times faster then using BufferedReader but calling new String(bytes)
creates a new String and makes the program even slower then when using a BufferedReader.
So I wanted to ask what is the fastest way to read a text-file line by line? Some say BufferedReader is the only solution to this problem.
P.S.: ra
is an instance of RunAutomaton from the dk.brics.Automaton library.
I very much doubt that BufferedReader
is going to cause a significant overhead. Adding your own code is likely to be at least as inefficient, and quite possibly wrong too.
For example, in the code that you've given you're calling new String(bytes)
which is always going to create a string from 1024 bytes, using the platform default encoding... not a good idea. Sure, you clear the array afterwards, but your strings are still going to contain a bunch of '\0' characters - which means a lot of wasted space, apart from anything else. You should at least restrict the portion of the byte array the string is being created from (which also means you don't need to clear the array afterwards).
Have you actually tried using BufferedReader
and found it to be too slow? You should usually write the simplest code which will meet your goals first, and then check whether it's fast enough... especially if your only reason for not doing so is an unspecified resource you "read on the internet". DO you want me to find hundreds of examples of people spouting incorrect performance suggestions? :)
As an alternative, you might want to look at Guava's overload of Files.readLines()
which takes a LineProcessor
.
Using plain BufferedReader I got 100+ MB/s. It is highly likely that the speed you can read the data from disk is your bottle neck, so how you do the reading won't make much difference.
BufferedReader is not the only solution, but it is fast enough for 99% of use cases, so why make things more complicated than they need to be?
Are frameworks an alternative?
I dont know about the performance, but
http://commons.apache.org/io/
http://commons.apache.org/io/api-release/index.html See IOUtils class
defines very easy to use helper classes for such cases.
According to this SO post, you might also want to give the Scanner class a shot.
i have a very simple loop that reads about 2000 lines (50k bytes) from a file on the sdcard using BufferedReader and it reads them all in about 100mS in debug mode on galaxy tab 2. not too bad. then i put a Scanner in the loop and the time went through the roof (tens of seconds), plus lots of GC_CONCURANT messages
Scanner scanner = new Scanner(line);
int eventType = scanner.nextInt(16);
so at least in my case its the Scanner that's the problem, i guess i need to scan the ints another way, but i have no idea why it could be so slow