I have to read a 53 MB file character by character. When I do it in C++ using ifstream, it is completed in milliseconds but using Java InputStream it takes several minutes. Is it normal for Java to be this slow or am I missing something?
Also, I need to complete the program in Java (it uses servlets from which I have to call the functions which process these characters). I was thinking maybe writing the file processing part in C or C++ and then using Java Native Interface to interface these functions with my Java programs... How is this idea?
Can anyone give me any other tip... I seriously need to read the file faster. I tried using buffered input, but still it is not giving performance even close to C++.
Edited: My code spans several files and it is very dirty so I am giving the synopsis
import java.io.*;
public class tmp {
public static void main(String args[]) {
try{
InputStream file = new BufferedInputStream(new FileInputStream("1.2.fasta"));
char ch;
while(file.available()!=0) {
ch = (char)file.read();
/* Do processing */
}
System.out.println("DONE");
file.close();
}catch(Exception e){}
}
}
I ran this code with a 183 MB file. It printed "Elapsed 250 ms".
final InputStream in = new BufferedInputStream(new FileInputStream("file.txt"));
final long start = System.currentTimeMillis();
int cnt = 0;
final byte[] buf = new byte[1000];
while (in.read(buf) != -1) cnt++;
in.close();
System.out.println("Elapsed " + (System.currentTimeMillis() - start) + " ms");
I would try this
// create the file so we have something to read.
final String fileName = "1.2.fasta";
FileOutputStream fos = new FileOutputStream(fileName);
fos.write(new byte[54 * 1024 * 1024]);
fos.close();
// read the file in one hit.
long start = System.nanoTime();
FileChannel fc = new FileInputStream(fileName).getChannel();
ByteBuffer bb = fc.map(FileChannel.MapMode.READ_ONLY, 0, fc.size());
while (bb.remaining() > 0)
bb.getLong();
long time = System.nanoTime() - start;
System.out.printf("Took %.3f seconds to read %.1f MB%n", time / 1e9, fc.size() / 1e6);
fc.close();
((DirectBuffer) bb).cleaner().clean();
prints
Took 0.016 seconds to read 56.6 MB
Use a BufferedInputStream
:
InputStream buffy = new BufferedInputStream(inputStream);
As noted above, use a BufferedInputStream. You could also use the NIO package. Note that for most files, BufferedInputStream will be just as fast reading as NIO. However, for extremely large files, NIO may do better because you can memory mapped file operations. Furthermore, the NIO package does interruptible IO, whereas the java.io package does not. That means if you want to cancel the operation from another thread, you have to use NIO to make it reliable.
ByteBuffer buf = ByteBuffer.allocate(BUF_SIZE);
FileChannel fileChannel = fileInputStream.getChannel();
int readCount = 0;
while ( (readCount = fileChannel.read(buf)) > 0) {
buf.flip();
while (buf.hasRemaining()) {
byte b = buf.get();
}
buf.clear();
}