When given a buffer of MAX_BUFFER_SIZE, and a file that far exceeds it, how can one:
- Read the file in blocks of MAX_BUFFER_SIZE?
- Do it as fast as possible
I tried using NIO
RandomAccessFile aFile = new RandomAccessFile(fileName, "r");
FileChannel inChannel = aFile.getChannel();
ByteBuffer buffer = ByteBuffer.allocate(CAPARICY);
int bytesRead = inChannel.read(buffer);
buffer.flip();
while (buffer.hasRemaining()) {
buffer.get();
}
buffer.clear();
bytesRead = inChannel.read(buffer);
aFile.close();
And regular IO
InputStream in = new FileInputStream(fileName);
long length = fileName.length();
if (length > Integer.MAX_VALUE) {
throw new IOException("File is too large!");
}
byte[] bytes = new byte[(int) length];
int offset = 0;
int numRead = 0;
while (offset < bytes.length
&& (numRead = in.read(bytes, offset, bytes.length - offset)) >= 0) {
offset += numRead;
}
if (offset < bytes.length) {
throw new IOException("Could not completely read file " + fileName);
}
in.close();
Turns out that regular IO is about 100 times faster in doing the same thing as NIO. Am i missing something? Is this expected? Is there a faster way to read the file in buffer chunks?
Ultimately i am working with a large file i don't have memory for to read it all at once. Instead, I'd like to read it incrementally in blocks that would then be used for processing.
If you want to make your first example faster
If you want it to be even faster.
This can take 10 - 20 micro-seconds for files up to 2 GB in size.
Assuming that you need to read the entire file into memory at once (as you're currently doing), neither reading smaller chunks nor NIO are going to help you here.
In fact, you'd probably be best reading larger chunks - which your regular IO code is automatically doing for you.
Your NIO code is currently slower, because you're only reading one byte at a time (using
buffer.get();
).If you want to process in chunks - for example, transferring between streams - here is a standard way of doing it without NIO:
This uses a buffer size of only 1 KB, but can transfer an unlimited amount of data.
(If you extend your answer with details of what you're actually looking to do at a functional level, I could further improve this to a better answer.)