C#.net identify zip file

2019-01-14 23:48发布

问题:

I am currently using the SharpZip api to handle my zip file entries. It works splendid for zipping and unzipping. Though, I am having trouble identifying if a file is a zip or not. I need to know if there is a way to detect if a file stream can be decompressed. Originally I used

FileStream lFileStreamIn = File.OpenRead(mSourceFile);
lZipFile = new ZipFile(lFileStreamIn);
ZipInputStream lZipStreamTester = new ZipInputStream(lFileStreamIn, mBufferSize);// not working
lZipStreamTester.Read(lBuffer, 0, 0);
if (lZipStreamTester.CanDecompressEntry)
{

The LZipStreamTester becomes null every time and the if statement fails. I tried it with/without a buffer. Can anybody give any insight as to why? I am aware that i can check for file extension. I need something that is more definitive than that. I am also aware that zip has a magic #(PK something), but it isn't a guarantee that it will always be there because it isn't a requirement of the format.

Also i read about .net 4.5 having native zip support so my project may migrate to that instead of sharpzip but I still need didn't see a method/param similar to CanDecompressEntry here: http://msdn.microsoft.com/en-us/library/3z72378a%28v=vs.110%29

My last resort will be to use a try catch and attempt an unzip on the file.

回答1:

This is a base class for a component that needs to handle data that is either uncompressed, PKZIP compressed (sharpziplib) or GZip compressed (built in .net). Perhaps a bit more than you need but should get you going. This is an example of using @PhonicUK's suggestion to parse the header of the data stream. The derived classes you see in the little factory mathod handled the specifics of PKZip and GZip decompression.

abstract class Expander
{
    private const int ZIP_LEAD_BYTES = 0x04034b50;
    private const ushort GZIP_LEAD_BYTES = 0x8b1f;

    public abstract MemoryStream Expand(Stream stream); 

    internal static bool IsPkZipCompressedData(byte[] data)
    {
        Debug.Assert(data != null && data.Length >= 4);
        // if the first 4 bytes of the array are the ZIP signature then it is compressed data
        return (BitConverter.ToInt32(data, 0) == ZIP_LEAD_BYTES);
    }

    internal static bool IsGZipCompressedData(byte[] data)
    {
        Debug.Assert(data != null && data.Length >= 2);
        // if the first 2 bytes of the array are theG ZIP signature then it is compressed data;
        return (BitConverter.ToUInt16(data, 0) == GZIP_LEAD_BYTES);
    }

    public static bool IsCompressedData(byte[] data)
    {
        return IsPkZipCompressedData(data) || IsGZipCompressedData(data);
    }

    public static Expander GetExpander(Stream stream)
    {
        Debug.Assert(stream != null);
        Debug.Assert(stream.CanSeek);
        stream.Seek(0, 0);

        try
        {
            byte[] bytes = new byte[4];

            stream.Read(bytes, 0, 4);

            if (IsGZipCompressedData(bytes))
                return new GZipExpander();

            if (IsPkZipCompressedData(bytes))
                return new ZipExpander();

            return new NullExpander();
        }
        finally
        {
            stream.Seek(0, 0);  // set the stream back to the begining
        }
    }
}


回答2:

View https://stackoverflow.com/a/16587134/206730 reference

Check the below links:

icsharpcode-sharpziplib-validate-zip-file

How-to-check-if-a-file-is-compressed-in-c#

ZIP files always start with 0x04034b50 (4 bytes)
View more: http://en.wikipedia.org/wiki/Zip_(file_format)#File_headers

Sample usage:

        bool isPKZip = IOHelper.CheckSignature(pkg, 4, IOHelper.SignatureZip);
        Assert.IsTrue(isPKZip, "Not ZIP the package : " + pkg);

// http://blog.somecreativity.com/2008/04/08/how-to-check-if-a-file-is-compressed-in-c/
    public static partial class IOHelper
    {
        public const string SignatureGzip = "1F-8B-08";
        public const string SignatureZip = "50-4B-03-04";

        public static bool CheckSignature(string filepath, int signatureSize, string expectedSignature)
        {
            if (String.IsNullOrEmpty(filepath)) throw new ArgumentException("Must specify a filepath");
            if (String.IsNullOrEmpty(expectedSignature)) throw new ArgumentException("Must specify a value for the expected file signature");
            using (FileStream fs = new FileStream(filepath, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
            {
                if (fs.Length < signatureSize)
                    return false;
                byte[] signature = new byte[signatureSize];
                int bytesRequired = signatureSize;
                int index = 0;
                while (bytesRequired > 0)
                {
                    int bytesRead = fs.Read(signature, index, bytesRequired);
                    bytesRequired -= bytesRead;
                    index += bytesRead;
                }
                string actualSignature = BitConverter.ToString(signature);
                if (actualSignature == expectedSignature) return true;
                return false;
            }
        }

    }


回答3:

You can either:

  • Use a try-catch structure and try to read the structure of a potential zip file
  • Parse the file header to see if it is a zip file

ZIP files always start with 0x04034b50 as its first 4 bytes ( http://en.wikipedia.org/wiki/Zip_(file_format)#File_headers )



回答4:

If you are programming for Web, you can check the file Content Type: application/zip



回答5:

Thanks to dkackman and Kiquenet for answers above. For completeness, the below code uses the signature to identify compressed (zip) files. You then have the added complexity that the newer MS Office file formats will also return match this signature lookup (your .docx and .xlsx files etc). As remarked upon elsewhere, these are indeed compressed archives, you can rename the files with a .zip extension and have a look at the XML inside.

Below code, first does a check for ZIP (compressed) using the signatures used above, and we then have a subsequent check for the MS Office packages. Note that to use the System.IO.Packaging.Package you need a project reference to "WindowsBase" (that is a .NET assembly reference).

    private const string SignatureZip = "50-4B-03-04";
    private const string SignatureGzip = "1F-8B-08";

    public static bool IsZip(this Stream stream)
    {
        if (stream.Position > 0)
        {
            stream.Seek(0, SeekOrigin.Begin);
        }

        bool isZip = CheckSignature(stream, 4, SignatureZip);
        bool isGzip = CheckSignature(stream, 3, SignatureGzip);

        bool isSomeKindOfZip = isZip || isGzip;

        if (isSomeKindOfZip && stream.IsPackage()) //Signature matches ZIP, but it's package format (docx etc).
        {
            return false;
        }

        return isSomeKindOfZip;
    }

    /// <summary>
    /// MS .docx, .xslx and other extensions are (correctly) identified as zip files using signature lookup.
    /// This tests if System.IO.Packaging is able to open, and if package has parts, this is not a zip file.
    /// </summary>
    /// <param name="stream"></param>
    /// <returns></returns>
    private static bool IsPackage(this Stream stream)
    {
        Package package = Package.Open(stream, FileMode.Open, FileAccess.Read);
        return package.GetParts().Any();
    }