I have been searching a lot for the fastest way to read and write again a large files (0.5 - 1 GB) in java with limited memory (about 64MB). Each line in the file represents a record, so I need to get them line by line. The file is a normal text file.
I tried BufferedReader and BufferedWriter but it doesn't seem to be the best option. It takes about 35 seconds to read and write a file of size 0.5 GB, only read write with no processing. I think the bottleneck here is writing as reading alone takes about 10 seconds.
I tried to read array of bytes, but then searching for lines in each array that was read takes more time.
Any suggestions please? Thanks
I have a written an extensive article about the many ways of reading files in Java and testing them against each other with sample files from 1KB to 1GB and I have found the following 3 methods were the fastest for reading 1GB files:
1) java.nio.file.Files.readAllBytes() - took just under 1 second to read a 1 GB test file.
2) java.nio.file.Files.lines() - took about 3.5 seconds to read in a 1 GB test file.
3) java.io.BufferedReader - took about 4.5 seconds to read a 1 GB test file.
The first thing I would try is to increase the buffer size of the BufferedReader and BufferedWriter. The default buffer sizes are not documented, but at least in the Oracle VM they are 8192 characters, which won't bring much performance advantage.
If you only need to make a copy of the file (and don't need actual access to the data), I would either drop the Reader/Writer approach and work directly with InputStream and OutputStream using a byte array as buffer:
or actually use NIO:
When benchmarking the different copy methods, I have however much larger differences (duration) between each run of the benchmark than between the different implementations. I/O caching (both on the OS level and the hard disk cache) plays a great role here and it is very difficult to say what is faster. On my hardware, copying a 1GB text file line by line using BufferedReader and BufferedWriter takes less than 5s in some runs and more than 30s in other.
I would recommend looking at the classes in the
java.nio
package. Non-blocking IO might be faster for sockets:http://docs.oracle.com/javase/6/docs/api/java/nio/package-summary.html
This article has benchmarks that say it's true:
http://vanillajava.blogspot.com/2010/07/java-nio-is-faster-than-java-io-for.html
I suspect your real problem is that you have limited hardware and what you do is software won't make much difference. If you have plenty of memory and CPU, more advanced tricks can help, but if you are just waiting on your hard drive because the file is not cached, it won't make much difference.
BTW: 500 MB in 10 secs or 50 MB/sec is a typical read speed for a HDD.
Try running the following to see at what point your system is unable to cache the file efficiently.
On a Linux machine with lots of memory.
On a windows machine with lots of memory.
In Java 7 you can use Files.readAllLines() and Files.write() methods. Here is the example: