Is there a safe way to run a diff on two zip compr

2020-02-10 03:19发布

Seems this would not be a deterministic thing, or is there a way to do this reliably?

13条回答
小情绪 Triste *
2楼-- · 2020-02-10 03:27

zipcmp compares the zip archives zip1 and zip2 and checks if they contain the same files, comparing their names, uncompressed sizes, and CRCs. File order and compressed size differences are ignored.

sudo apt-get install zipcmp

查看更多
叼着烟拽天下
3楼-- · 2020-02-10 03:29

If you're using gzip, you can do something like this:

# diff <(zcat file1.gz) <(zcat file2.gz)
查看更多
淡お忘
4楼-- · 2020-02-10 03:29

A python solution for zip files:

import difflib
import zipfile

def diff(filename1, filename2):
    differs = False

    z1 = zipfile.ZipFile(open(filename1))
    z2 = zipfile.ZipFile(open(filename2))
    if len(z1.infolist()) != len(z2.infolist()):
        print "number of archive elements differ: {} in {} vs {} in {}".format(
            len(z1.infolist()), z1.filename, len(z2.infolist()), z2.filename)
        return 1
    for zipentry in z1.infolist():
        if zipentry.filename not in z2.namelist():
            print "no file named {} found in {}".format(zipentry.filename,
                                                        z2.filename)
            differs = True
        else:
            diff = difflib.ndiff(z1.open(zipentry.filename),
                                 z2.open(zipentry.filename))
            delta = ''.join(x[2:] for x in diff
                            if x.startswith('- ') or x.startswith('+ '))
            if delta:
                differs = True
                print "content for {} differs:\n{}".format(
                    zipentry.filename, delta)
    if not differs:
        print "all files are the same"
        return 0
    return 1

Use as

diff(filename1, filename2)

It compares files line-by-line in memory and shows changes.

查看更多
够拽才男人
5楼-- · 2020-02-10 03:32

Beyond compare has no problem with this.

查看更多
Lonely孤独者°
6楼-- · 2020-02-10 03:35

In general, you cannot avoid decompressing and then comparing. Different compressors will result in different DEFLATEd byte streams, which when INFLATEd result in the same original text. You cannot simply compare the DEFLATEd data, one to another. That will FAIL in some cases.

But in a ZIP scenario, there is a CRC32 calculated and stored for each entry. So if you want to check files, you can simply compare the stored CRC32 associated to each DEFLATEd stream, with the caveats on the uniqueness properties of the CRC32 hash. It may fit your needs to compare the FileName and the CRC.

You would need a ZIP library that reads zip files and exposes those things as properties on the "ZipEntry" object. DotNetZip will do that for .NET apps.

查看更多
爷的心禁止访问
7楼-- · 2020-02-10 03:36

This isn't particularly elegant, but you can use the FileMerge application that comes with Mac OS X developer tools to compare the contents of zip files using a custom filter.

Create a script ~/bin/zip_filemerge_filter.bash with contents:

#!/bin/bash
##
#  List the size, CR-32 checksum, and file path of each file in a zip archive,
#  sorted in order by file path.
##
unzip -v -l "${1}" | cut -c 1-9,59-,49-57 | sort -k3
exit $?

Make the script executable (chmod +x ~/bin/zip_filemerge_filter.bash).

Open FileMerge, open the Preferences, and go to the "Filters" tab. Add an item to the list with: Extension:"zip", Filter:"~/bin/zip_filemerge_filter.bash $(FILE)", Display: Filtered, Apply*: No. (I've also added the filer for .jar and .war files.)

Then use FileMerge (or the command line "opendiff" wrapper) to compare two .zip files.

This won't let you diff the contents of files within the zip archives, but will let you quickly see which files appear within one only archive and which files exist in both but have different content (i.e. different size and/or checksum).

查看更多
登录 后发表回答