I'm trying to illustrate the difference in performance between traditional IO and memory mapped files in java to students. I found an example somewhere on internet but not everything is clear to me, I don't even think all steps are nececery. I read a lot about it here and there but I'm not convinced about a correct implementation of neither of them.
The code I try to understand is:
public class FileCopy{
public static void main(String args[]){
if (args.length < 1){
System.out.println(" Wrong usage!");
System.out.println(" Correct usage is : java FileCopy <large file with full path>");
System.exit(0);
}
String inFileName = args[0];
File inFile = new File(inFileName);
if (inFile.exists() != true){
System.out.println(inFileName + " does not exist!");
System.exit(0);
}
try{
new FileCopy().memoryMappedCopy(inFileName, inFileName+".new" );
new FileCopy().customBufferedCopy(inFileName, inFileName+".new1");
}catch(FileNotFoundException fne){
fne.printStackTrace();
}catch(IOException ioe){
ioe.printStackTrace();
}catch (Exception e){
e.printStackTrace();
}
}
public void memoryMappedCopy(String fromFile, String toFile ) throws Exception{
long timeIn = new Date().getTime();
// read input file
RandomAccessFile rafIn = new RandomAccessFile(fromFile, "rw");
FileChannel fcIn = rafIn.getChannel();
ByteBuffer byteBuffIn = fcIn.map(FileChannel.MapMode.READ_WRITE, 0,(int) fcIn.size());
fcIn.read(byteBuffIn);
byteBuffIn.flip();
RandomAccessFile rafOut = new RandomAccessFile(toFile, "rw");
FileChannel fcOut = rafOut.getChannel();
ByteBuffer writeMap = fcOut.map(FileChannel.MapMode.READ_WRITE,0,(int) fcIn.size());
writeMap.put(byteBuffIn);
long timeOut = new Date().getTime();
System.out.println("Memory mapped copy Time for a file of size :" + (int) fcIn.size() +" is "+(timeOut-timeIn));
fcOut.close();
fcIn.close();
}
static final int CHUNK_SIZE = 100000;
static final char[] inChars = new char[CHUNK_SIZE];
public static void customBufferedCopy(String fromFile, String toFile) throws IOException{
long timeIn = new Date().getTime();
Reader in = new FileReader(fromFile);
Writer out = new FileWriter(toFile);
while (true) {
synchronized (inChars) {
int amountRead = in.read(inChars);
if (amountRead == -1) {
break;
}
out.write(inChars, 0, amountRead);
}
}
long timeOut = new Date().getTime();
System.out.println("Custom buffered copy Time for a file of size :" + (int) new File(fromFile).length() +" is "+(timeOut-timeIn));
in.close();
out.close();
}
}
When exactly is it nececary to use RandomAccessFile
? Here it is used to read and write in the memoryMappedCopy
, is it actually nececary just to copy a file at all? Or is it a part of memorry mapping?
In customBufferedCopy
, why is synchronized
used here?
I also found a different example that -should- test the performance between the 2:
public class MappedIO {
private static int numOfInts = 4000000;
private static int numOfUbuffInts = 200000;
private abstract static class Tester {
private String name;
public Tester(String name) { this.name = name; }
public long runTest() {
System.out.print(name + ": ");
try {
long startTime = System.currentTimeMillis();
test();
long endTime = System.currentTimeMillis();
return (endTime - startTime);
} catch (IOException e) {
throw new RuntimeException(e);
}
}
public abstract void test() throws IOException;
}
private static Tester[] tests = {
new Tester("Stream Write") {
public void test() throws IOException {
DataOutputStream dos = new DataOutputStream(
new BufferedOutputStream(
new FileOutputStream(new File("temp.tmp"))));
for(int i = 0; i < numOfInts; i++)
dos.writeInt(i);
dos.close();
}
},
new Tester("Mapped Write") {
public void test() throws IOException {
FileChannel fc =
new RandomAccessFile("temp.tmp", "rw")
.getChannel();
IntBuffer ib = fc.map(
FileChannel.MapMode.READ_WRITE, 0, fc.size())
.asIntBuffer();
for(int i = 0; i < numOfInts; i++)
ib.put(i);
fc.close();
}
},
new Tester("Stream Read") {
public void test() throws IOException {
DataInputStream dis = new DataInputStream(
new BufferedInputStream(
new FileInputStream("temp.tmp")));
for(int i = 0; i < numOfInts; i++)
dis.readInt();
dis.close();
}
},
new Tester("Mapped Read") {
public void test() throws IOException {
FileChannel fc = new FileInputStream(
new File("temp.tmp")).getChannel();
IntBuffer ib = fc.map(
FileChannel.MapMode.READ_ONLY, 0, fc.size())
.asIntBuffer();
while(ib.hasRemaining())
ib.get();
fc.close();
}
},
new Tester("Stream Read/Write") {
public void test() throws IOException {
RandomAccessFile raf = new RandomAccessFile(
new File("temp.tmp"), "rw");
raf.writeInt(1);
for(int i = 0; i < numOfUbuffInts; i++) {
raf.seek(raf.length() - 4);
raf.writeInt(raf.readInt());
}
raf.close();
}
},
new Tester("Mapped Read/Write") {
public void test() throws IOException {
FileChannel fc = new RandomAccessFile(
new File("temp.tmp"), "rw").getChannel();
IntBuffer ib = fc.map(
FileChannel.MapMode.READ_WRITE, 0, fc.size())
.asIntBuffer();
ib.put(0);
for(int i = 1; i < numOfUbuffInts; i++)
ib.put(ib.get(i - 1));
fc.close();
}
}
};
public static void main(String[] args) {
for(int i = 0; i < tests.length; i++)
System.out.println(tests[i].runTest());
}
}
I more or less see whats going on, my output looks like this:
Stream Write: 653
Mapped Write: 51
Stream Read: 651
Mapped Read: 40
Stream Read/Write: 14481
Mapped Read/Write: 6
What is makeing the Stream Read/Write so unbelievably long? And as a read/write test, to me it looks a bit pointless to read the same integer over and over (if I understand well what's going on in the Stream Read/Write
) Wouldn't it be better to read int's from the previously written file and just read and write ints on the same place? Is there a better way to illustrate it?
I've been breaking my head about a lot of these things for a while and I just can't get the whole picture..
1) These sound like questions your students should be asking - not the other way around?
2) The reason the two methods are used are to demonstrate the different ways that you can copy a file. I would hazard a guess that the first method (RamdomAccessFile) creates a version of the file in RAM, and then copies to a new version on the disk, and that the second method (customBufferedCop) reads directly from the drive.
3) I'm not sure, but I think synchronized is used to ensure that multiple instances of the same class do not write at the same time.
4) As for the last question, I've got to go - so I hope someone else can help you with that.
Seriously though, these sound like just the questions a tutor should be teaching to their students. If you don't have the ability to research simple things like this yourself, what kind of example are you setting your students? </rant>
Thanks for looking in to this. I will look at the first examples later, for now, my professor asked to rewrite the 2 tests (Stream and mapped read/write)
They generate random ints, first read the index (the generated int) and check if the int at this index is equal to the generated int, if it's not equal, the generated int is written at its index. He thought this could result in a better test, making more use of the
RandomAccessFile
, does this make sence?However I have some issues, first of all I dont know how to use a buffer with the stream read/write when I'm using
RandomAccessFile
, I found a lot aboutbyte[]
buffers using an array but i'm not sure how to use it correctly.My code so far for this test:
So this is still unbuffered..
The second test I did as following:
For small numbers of
numOfUbuffInts
it seems to go fast, for large numbers (20 000 000+) it takes ages. I just tried some things but i'm not sure if i'm on the right track.What I see with the one benchmark "Stream Read/Write" is:
This explains the very high cost of that particular benchmark.
You asked:
This is what the author I think was trying to do with the last two benchmarks but that's not what they got. With
RandomAccessFile
to read and write the same place in the file you would need to put a seek before the read and the write:This does demonstrate one advantage of memory mapped I/O since you can just use the same memory address to access the same bits of the file instead of having to do an additional seek before every call.
By the way, your first benchmark example class may have issues too since
CHUNK_SIZE
is not an even multiple of the file system block size. Often it's good to use multiples of 1024 and 8192 has been shown as a good sweet spot for most applications (and the reason the Java'sBufferedInputStream
andBufferedOutputStream
use that value for the default buffer sizes). The OS will need to read an extra block(s) to satisfy read requests that are not on block boundaries. Subsequent reads (of a stream) will reread the same block, possibly some full blocks, and then an extra again. Memory mapped I/O always physically reads and writes in blocks as the actual I/Os are handled by the OS memory manager which would use its page size. Page size is always optimized to map well to file blocks.In that example, the memory mapped test does read everything into a memory buffer and then write it all back out. These two tests are really not well written to compare those two cases.
memmoryMappedCopy
should read and write in the same chunk size ascustomBufferedCopy
.EDIT: There may even be more things wrong with these test classes. Because of your comment to the other answer I looked more carefully at the first class again.
Method
customBufferedCopy
is static and uses a static buffer. For this kind of test that buffer should be defined within the method. Then it would not need to usesynchronized
(though it doesn't need it in this context and for these tests anyway). This static method is called as a normal method, which is bad programming practice (i.e. useFileCopy.customBufferedCopy(...)
instead ofnew FileCopy().customBufferedCopy(...)
).If you actually did run this test from multiple threads the use of that buffer would be contentious and the benchmark would not just be about file I/O so it would not be fair to compare the results of the two test methods.