Our software is decompressing certain byte data through a GZipStream
, which reads data from a MemoryStream
. These data are decompressed in blocks of 4KB and written into another MemoryStream
.
We've realized that the memory the process allocates is much higher than the actual decompressed data.
Example:
A compressed byte array with 2,425,536 bytes gets decompressed to 23,050,718 bytes. The memory profiler we use shows that the Method MemoryStream.set_Capacity(Int32 value)
allocated 67,104,936 bytes. That's a factor of 2.9 between reserved and actually written memory.
Note: MemoryStream.set_Capacity
is called from MemoryStream.EnsureCapacity
which is itself called from MemoryStream.Write
in our function.
Why does the MemoryStream
reserve so much capacity, even though it only appends blocks of 4KB?
Here is the code snippet which decompresses data:
private byte[] Decompress(byte[] data)
{
using (MemoryStream compressedStream = new MemoryStream(data))
using (GZipStream zipStream = new GZipStream(compressedStream, CompressionMode.Decompress))
using (MemoryStream resultStream = new MemoryStream())
{
byte[] buffer = new byte[4096];
int iCount = 0;
while ((iCount = zipStream.Read(buffer, 0, buffer.Length)) > 0)
{
resultStream.Write(buffer, 0, iCount);
}
return resultStream.ToArray();
}
}
Note: If relevant, this is the system configuration:
- Windows XP 32bit,
- .NET 3.5
- Compiled with Visual Studio 2008
Well, increasing the capacity of the streams means creating a whole new array with the new capacity, and copying the old one over. That's very expensive, and if you did it for each
Write
, your performance would suffer a lot. So instead, theMemoryStream
expands more than necessary. If you want to improve that behaviour and you know the total capacity required, simply use theMemoryStream
constructor with thecapacity
parameter :) You can then useMemoryStream.GetBuffer
instead ofToArray
too.You're also seeing the discarded old buffers in the memory profiler (e.g. from 8 MiB to 16 MiB etc.).
Of course, you don't care about having a single consecutive array, so it might be a better idea for you to simply have a memory stream of your own that uses multiple arrays created as needed, in as big chunks as necessary, and then just copy it all at once to the output
byte[]
(if you even need thebyte[]
at all - quite likely, that's a design problem).MemoryStream
doubles its internal buffer when it runs out of space. This can lead to 2x waste. I cannot tell why you are seeing more than that. But this basic behavior is expected.If you don't like this behavior write your own stream that stores its data in smaller chunks (e.g. a
List<byte[1024 * 64]>
). Such an algorithm would bounds its amount of waste to 64KB.Looks like you are looking at total amount of allocated memory, not the last call. Since memory stream doubles its size on reallocation it it will grow about twice each time - so total allocated memory would be approximately sum of powers of 2 like:
Sum i=1 k (2i) = 2k+1 -1.
(where
k
is number of re-allocations like k = 1 + log2 StreamSizeWhich is about what you see.
Because this is the algorithm for how it expands its capacity.
So every time you hit the capacity limit it doubles the size of the capacity. The reason it does this is that
Buffer.InternalBlockCopy
operation is slow for large arrays so if it had to frequently resize every Write call the performance would drop significantly.A few things you could do to improve the performance for you is you could set the initial capacity to be at least the size of your compressed array and you could then increase size by a factor smaller than
2.0
to reduce the amount of memory you are using.If you wanted to you could do even more fancy algorithms like resizing based on the current compression ratio