I was trying to read a file into an array by using FileInputStream, and an ~800KB file took about 3 seconds to read into memory. I then tried the same code except with the FileInputStream wrapped into a BufferedInputStream and it took about 76 milliseconds. Why is reading a file byte by byte done so much faster with a BufferedInputStream even though I'm still reading it byte by byte? Here's the code (the rest of the code is entirely irrelevant). Note that this is the "fast" code. You can just remove the BufferedInputStream if you want the "slow" code:
InputStream is = null;
try {
is = new BufferedInputStream(new FileInputStream(file));
int[] fileArr = new int[(int) file.length()];
for (int i = 0, temp = 0; (temp = is.read()) != -1; i++) {
fileArr[i] = temp;
}
BufferedInputStream is over 30 times faster. Far more than that. So, why is this, and is it possible to make this code more efficient (without using any external libraries)?
It is because of the cost of disk access. Lets assume you will have a file which size is 8kb. 8*1024 times access disk will be needed to read this file without BufferedInputStream.
At this point, BufferedStream comes to the scene and acts as a middle man between FileInputStream and the file to be read.
In one shot, will get chunks of bytes default is 8kb to memory and then FileInputStream will read bytes from this middle man. This will decrease the time of the operation.
A BufferedInputStream wrapped around a FileInputStream, will request data from the FileInputStream in big chunks (512 bytes or so by default, I think.) Thus if you read 1000 characters one at a time, the FileInputStream will only have to go to the disk twice. This will be much faster!
In
FileInputStream
, the methodread()
reads a single byte. From the source code:This is a native call to the OS which uses the disk to read the single byte. This is a heavy operation.
With a
BufferedInputStream
, the method delegates to an overloadedread()
method that reads8192
amount of bytes and buffers them until they are needed. It still returns only the single byte (but keeps the others in reserve). This way theBufferedInputStream
makes less native calls to the OS to read from the file.For example, your file is
32768
bytes long. To get all the bytes in memory with aFileInputStream
, you will require32768
native calls to the OS. With aBufferedInputStream
, you will only require4
, regardless of the number ofread()
calls you will do (still32768
).As to how to make it faster, you might want to consider Java 7's NIO
FileChannel
class, but I have no evidence to support this.