I've got a 40MB file in the disk and I need to "map" it into memory using a byte array.
At first, I thought writing the file to a ByteArrayOutputStream would be the best way, but I find it takes about 160MB of heap space at some moment during the copy operation.
Does somebody know a better way to do this without using three times the file size of RAM?
Update: Thanks for your answers. I noticed I could reduce memory consumption a little telling ByteArrayOutputStream initial size to be a bit greater than the original file size (using the exact size with my code forces reallocation, got to check why).
There's another high memory spot: when I get byte[] back with ByteArrayOutputStream.toByteArray. Taking a look to its source code, I can see it is cloning the array:
public synchronized byte toByteArray()[] {
return Arrays.copyOf(buf, count);
}
I'm thinking I could just extend ByteArrayOutputStream and rewrite this method, so to return the original array directly. Is there any potential danger here, given the stream and the byte array won't be used more than once?
... came here with the same observation when reading a 1GB file: Oracle's ByteArrayOutputStream has a lazy memory management. A byte-Array is indexed by an int and such anyway limited to 2GB. Without dependency on 3rd-party you might find this useful:
MappedByteBuffer
might be what you're looking for.I'm surprised it takes so much RAM to read a file in memory, though. Have you constructed the
ByteArrayOutputStream
with an appropriate capacity? If you haven't, the stream could allocate a new byte array when it's near the end of the 40 MB, meaning that you would, for example, have a full buffer of 39MB, and a new buffer of twice the size. Whereas if the stream has the appropriate capacity, there won't be any reallocation (faster), and no wasted memory.I find this extremely surprising ... to the extent that I have my doubts that you are measuring the heap usage correctly.
Let's assume that your code is something like this:
Now the way that a ByteArrayOutputStream manages its buffer is to allocate an initial size, and (at least) double the buffer when it fills it up. Thus, in the worst case
baos
might use up to 80Mb buffer to hold a 40Mb file.The final step allocates a new array of exactly
baos.size()
bytes to hold the buffer's contents. That's 40Mb. So the peak amount of memory that is actually in use should be 120Mb.So where are those extra 40Mb being used? My guess is that they are not, and that you are actually reporting the total heap size, not the amount of memory that is occupied by reachable objects.
So what is the solution?
You could use a memory mapped buffer.
You could give a size hint when you allocate the
ByteArrayOutputStream
; e.g.You could dispense with the
ByteArrayOutputStream
entirely and read directly into a byte array.Both options 1 and 2 should have an peak memory usage of 40Mb while reading a 40Mb file; i.e. no wasted space.
It would be helpful if you posted your code, and described your methodology for measuring memory usage.
The potential danger is that your assumptions are incorrect, or become incorrect due to someone else modifying your code unwittingly ...