Compress large file using SharpZipLib causing Out

2019-09-08 03:52发布

I have a 453MB XML file which I'm trying to compress to a ZIP using SharpZipLib.

Below is the code I'm using to create the zip, but it's causing an OutOfMemoryException. This code successfully compresses a file of 428MB.

Any idea why the exception is happening, as I can't see why, as my system has plenty of memory available.

public void CompressFiles(List<string> pathnames, string zipPathname)
{
    try
    {
        using (FileStream stream = new FileStream(zipPathname, FileMode.Create, FileAccess.Write, FileShare.None))
        {
            using (ZipOutputStream stream2 = new ZipOutputStream(stream))
            {
                foreach (string str in pathnames)
                {
                    FileStream stream3 = new FileStream(str, FileMode.Open, FileAccess.Read, FileShare.Read);
                    byte[] buffer = new byte[stream3.Length];
                    try
                    {
                        if (stream3.Read(buffer, 0, buffer.Length) != buffer.Length)
                        {
                            throw new Exception(string.Format("Error reading '{0}'.", str));
                        }
                    }
                    finally
                    {
                        stream3.Close();
                    }
                    ZipEntry entry = new ZipEntry(Path.GetFileName(str));
                    stream2.PutNextEntry(entry);
                    stream2.Write(buffer, 0, buffer.Length);
                }
                stream2.Finish();
            }
        }
    }
    catch (Exception)
    {
        File.Delete(zipPathname);
        throw;
    }
}

2条回答
Juvenile、少年°
2楼-- · 2019-09-08 04:25

You're trying to create a buffer as big as the file. Instead, make the buffer a fixed size, read some bytes into it, and write the number of read bytes into the zip file.

Here's your code with a buffer of 4096 bytes (and some cleanup):

public static void CompressFiles(List<string> pathnames, string zipPathname)
{
    const int BufferSize = 4096;
    byte[] buffer = new byte[BufferSize];

    try
    {
        using (FileStream stream = new FileStream(zipPathname,
            FileMode.Create, FileAccess.Write, FileShare.None))
        using (ZipOutputStream stream2 = new ZipOutputStream(stream))
        {
            foreach (string str in pathnames)
            {
                using (FileStream stream3 = new FileStream(str,
                    FileMode.Open, FileAccess.Read, FileShare.Read))
                {
                    ZipEntry entry = new ZipEntry(Path.GetFileName(str));
                    stream2.PutNextEntry(entry);

                    int read;
                    while ((read = stream3.Read(buffer, 0, buffer.Length)) > 0)
                    {
                        stream2.Write(buffer, 0, read);
                    }
                }
            }
            stream2.Finish();
        }
    }
    catch (Exception)
    {
        File.Delete(zipPathname);
        throw;
    }
}

Especially note this block:

const int BufferSize = 4096;
byte[] buffer = new byte[BufferSize];
// ...
int read;
while ((read = stream3.Read(buffer, 0, buffer.Length)) > 0)
{
    stream2.Write(buffer, 0, read);
}

This reads bytes into buffer. When there are no more bytes, the Read() method returns 0, so that's when we stop. When Read() succeeds, we can be sure there is some data in the buffer but we don't know how many bytes. The whole buffer might be filled, or just a small portion of it. Therefore, we use the number of read bytes read to determine how many bytes to write to the ZipOutputStream.

That block of code, by the way, can be replaced by a simple statement that was added to .Net 4.0, which does exactly the same:

stream3.CopyTo(stream2);

So, your code could become:

public static void CompressFiles(List<string> pathnames, string zipPathname)
{
    try
    {
        using (FileStream stream = new FileStream(zipPathname,
            FileMode.Create, FileAccess.Write, FileShare.None))
        using (ZipOutputStream stream2 = new ZipOutputStream(stream))
        {
            foreach (string str in pathnames)
            {
                using (FileStream stream3 = new FileStream(str,
                    FileMode.Open, FileAccess.Read, FileShare.Read))
                {
                    ZipEntry entry = new ZipEntry(Path.GetFileName(str));
                    stream2.PutNextEntry(entry);

                    stream3.CopyTo(stream2);
                }
            }
            stream2.Finish();
        }
    }
    catch (Exception)
    {
        File.Delete(zipPathname);
        throw;
    }
}

And now you know why you got the error, and how to use buffers.

查看更多
Juvenile、少年°
3楼-- · 2019-09-08 04:36

You're allocating a lot of memory for no good reason, and I bet you have a 32-bit process. 32-bit processes can only allocate up to 2GB of virtual memory in normal conditions, and the library surely allocates memory too.

Anyway, several things are wrong here:

  • byte[] buffer = new byte[stream3.Length];

    Why? You don't need to store the whole thing in memory to process it.

  • if (stream3.Read(buffer, 0, buffer.Length) != buffer.Length)

    This one is nasty. Stream.Read is explicitly allowed to return less bytes than what you asked for, and this is still a valid result. When reading a stream into a buffer you have to call Read repeatedly until the buffer is filled or the end of the stream is reached.

  • Your variables should have more meaningful names. You can easily get lost with these stream2, stream3 etc.

A simple solution would be:

using (var zipFileStream = new FileStream(zipPathname, FileMode.Create, FileAccess.Write, FileShare.None))
using (ZipOutputStream zipStream = new ZipOutputStream(zipFileStream))
{
    foreach (string str in pathnames)
    {
        using(var itemStream = new FileStream(str, FileMode.Open, FileAccess.Read, FileShare.Read))
        {
            var entry = new ZipEntry(Path.GetFileName(str));
            zipStream.PutNextEntry(entry);
            itemStream.CopyTo(zipStream);
        }
    }
    zipStream.Finish();
}
查看更多
登录 后发表回答