Creating a dynamic zip of a bunch of URLs on the f

2020-07-13 03:36发布

问题:

I am trying to create a zip file of any size on the fly. The source of the zip archive is a bunch of URLs and could be potentially large (500 4MB JPGs in the list). I want to be able to do everything inside the request and have the download start right away and have the zip created and streamed as it is built. It should not have to reside in memory or on disk on the server.

The closest I have come is this: Note: urls is a keyvaluepair of URLs to the file names as they should exist in the created zip

Response.ClearContent();
Response.ClearHeaders();
Response.ContentType = "application/zip";
Response.AddHeader("Content-Disposition", "attachment; filename=DyanmicZipFile.zip");

using (var memoryStream = new MemoryStream())
{
    using (var archive = new ZipArchive(memoryStream, ZipArchiveMode.Create, true))
    {
        foreach (KeyValuePair<string, string> fileNamePair in urls)
        {
            var zipEntry = archive.CreateEntry(fileNamePair.Key);

            using (var entryStream = zipEntry.Open())
                using (WebClient wc = new WebClient())
                    wc.OpenRead(GetUrlForEntryName(fileNamePair.Key)).CopyTo(entryStream);

                //this doesn't work either
                //using (var streamWriter = new StreamWriter(entryStream))
                //  using (WebClient wc = new WebClient())
                //      streamWriter.Write(wc.OpenRead(GetUrlForEntryName(fileNamePair.Key)));
        }
    }

    memoryStream.WriteTo(Response.OutputStream);
}
HttpContext.Current.ApplicationInstance.CompleteRequest();

This code gives me a zip file, but each JPG file inside the zip is just a text file that says "System.Net.ConnectStream" I have other attempts on this that do build a zip file with the proper files inside, but you can tell by the delay at the beginning that the server is completely building the zip in memory and then blasting it down at the end. It doesn't respond at all when the file count gets near 50. The part in comments gives me the same result I have tried Ionic.Zip as well.

This is .NET 4.5 on IIS8. I am building with VS2013 and trying to run this on AWS Elastic Beanstalk.

回答1:

You're trying to create a zip file and have it stream while it's being created. This turns out to be very difficult.

You need to understand the Zip file format. In particular, notice that a local file entry has header fields that can't be updated (CRC, compressed and uncompressed file sizes) until the entire file has been compressed. So at minimum you'll have to buffer at least one entire file before sending it to the response stream.

So at best you could do something like:

open archive
for each file
    create entry
    write file to entry
    read entry raw data and send to the response output stream

The problem you'll run into is that there's no documented way (and no undocumented way that I'm aware of) to read the raw data. The only read method ends up decompressing the data and throwing away the headers.

There might be some other zip library available that can do what you need. I wouldn't suggest trying to do it with ZipArchive.



回答2:

So to answer my own question - here is the solution that works for me:

private void ProcessWithSharpZipLib()
{
    byte[] buffer = new byte[4096];

    ICSharpCode.SharpZipLib.Zip.ZipOutputStream zipOutputStream = new ICSharpCode.SharpZipLib.Zip.ZipOutputStream(Response.OutputStream);
    zipOutputStream.SetLevel(0); //0-9, 9 being the highest level of compression
    zipOutputStream.UseZip64 = ICSharpCode.SharpZipLib.Zip.UseZip64.Off;

    foreach (KeyValuePair<string, string> fileNamePair in urls)
    {
        using (WebClient wc = new WebClient())
        {
            using (Stream wcStream = wc.OpenRead(GetUrlForEntryName(fileNamePair.Key)))
            {
                ICSharpCode.SharpZipLib.Zip.ZipEntry entry = new ICSharpCode.SharpZipLib.Zip.ZipEntry(ICSharpCode.SharpZipLib.Zip.ZipEntry.CleanName(fileNamePair.Key));

                zipOutputStream.PutNextEntry(entry);

                int count = wcStream.Read(buffer, 0, buffer.Length);
                while (count > 0)
                {
                    zipOutputStream.Write(buffer, 0, count);
                    count = wcStream.Read(buffer, 0, buffer.Length);
                    if (!Response.IsClientConnected)
                    {
                        break;
                    }
                    Response.Flush();
                }
            }
        }
    }
    zipOutputStream.Close();

    Response.Flush();
    Response.End();
}


回答3:

There must be a way in the zip component you are using that allows for delayed addition of entries to the archive, ie. adding them after the zip.Save() is called. I am using IonicZip using the delayed technique, The code to download flickr albums looks like this:

protected void Page_Load(object sender, EventArgs e)
{
    if (!IsLoggedIn())
        Response.Redirect("/login.aspx");
    else
    {
        // this is dco album id, find out what photosetId it maps to
        string albumId = Request.Params["id"];
        Album album = findAlbum(new Guid(albumId));
        Flickr flickr = FlickrInstance();
        PhotosetPhotoCollection photos = flickr.PhotosetsGetPhotos(album.PhotosetId, PhotoSearchExtras.OriginalUrl | PhotoSearchExtras.Large2048Url | PhotoSearchExtras.Large1600Url);

        Response.Clear();
        Response.BufferOutput = false;

        // ascii only
        //string archiveName = album.Title + ".zip";
        string archiveName = "photos.zip";
        Response.ContentType = "application/zip";
        Response.AddHeader("content-disposition", "attachment; filename=" + archiveName);
        int picCount = 0;
        string picNamePref = album.PhotosetId.Substring(album.PhotosetId.Length - 6);
        using (ZipFile zip = new ZipFile())
        {
            zip.CompressionMethod = CompressionMethod.None;
            zip.CompressionLevel = Ionic.Zlib.CompressionLevel.None;
            zip.ParallelDeflateThreshold = -1;
            _map = new Dictionary<string, string>();
            foreach (Photo p in photos)
            {
                string pictureUrl = p.Large2048Url;
                if (string.IsNullOrEmpty(pictureUrl))
                    pictureUrl = p.Large1600Url;
                if (string.IsNullOrEmpty(pictureUrl))
                    pictureUrl = p.LargeUrl;

                string pictureName = picNamePref + "_" + (++picCount).ToString("000") + ".jpg";
                _map.Add(pictureName, pictureUrl);
                zip.AddEntry(pictureName, processPicture);
            }
            zip.Save(Response.OutputStream);
        }
        Response.Close();
    }
}
private volatile Dictionary<string, string> _map;
protected void processPicture(string pictureName, Stream output)
{
    HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create(_map[pictureName]);
    using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
    {
        using (Stream input = response.GetResponseStream())
        {
            byte[] buf = new byte[8092];
            int len;
            while ( (len = input.Read(buf, 0, buf.Length)) > 0)
                output.Write(buf, 0, len);
        }
        output.Flush();
    }
}

This ways the code in Page_Load gets to zip.Save() immediately, the download starts (the client is presented with the "Save As" box, and only then the images are pulled from flickr.



回答4:

This code working fine but when I host my code on windows azure as cloud service it corrupts my zip file throwing message invalid file

private void ProcessWithSharpZipLib(){
    byte[] buffer = new byte[4096];

    ICSharpCode.SharpZipLib.Zip.ZipOutputStream zipOutputStream = new ICSharpCode.SharpZipLib.Zip.ZipOutputStream(Response.OutputStream);
    zipOutputStream.SetLevel(0); //0-9, 9 being the highest level of compression
    zipOutputStream.UseZip64 = ICSharpCode.SharpZipLib.Zip.UseZip64.Off;

    foreach (KeyValuePair<string, string> fileNamePair in urls)
    {
        using (WebClient wc = new WebClient())
        {
            using (Stream wcStream = wc.OpenRead(GetUrlForEntryName(fileNamePair.Key)))
            {
                ICSharpCode.SharpZipLib.Zip.ZipEntry entry = new ICSharpCode.SharpZipLib.Zip.ZipEntry(ICSharpCode.SharpZipLib.Zip.ZipEntry.CleanName(fileNamePair.Key));

                zipOutputStream.PutNextEntry(entry);

                int count = wcStream.Read(buffer, 0, buffer.Length);
                while (count > 0)
                {
                    zipOutputStream.Write(buffer, 0, count);
                    count = wcStream.Read(buffer, 0, buffer.Length);
                    if (!Response.IsClientConnected)
                    {
                        break;
                    }
                    Response.Flush();
                }
            }
        }
    }
    zipOutputStream.Close();

    Response.Flush();
    Response.End();
}

This code is working fine on local machine but not after deployed on server. It corrupts my zip file if its large in size.