Is there any hash function which have following pr

2019-09-21 02:09发布

问题:

I want a hash function which is fast, collision resistant and can give unique output. The primary requirement is - it should be persist-able i.e It's progress(hashing progress) could be saved on a file and then later resumed. You can also provide your own implementation with Python.

Implementations in "other languages" is/are also accepted if it is possible to use that with Python without getting hands dirty going internal.

Thanks in advance :)

回答1:

Because of the pigeonhole principle no hash function can generate hashes which are unique / collision-proof. A good hashing function is collision-resistant, and makes it difficult to generate a file that produces a specified hash. Designing a good hash function is an advanced topic, and I'm certainly no expert in that field. However, since my code is based on sha256 it should be fairly collision-resistant, and hopefully it's also difficult to generate a file that produces a specified hash, but I can make no guarantees in that regard.


Here's a resumable hash function based on sha256 which is fairly fast. It takes about 44 seconds to hash a 1.4GB file on my 2GHz machine with 2GB of RAM.

persistent_hash.py

#! /usr/bin/env python

''' Use SHA-256 to make a resumable hash function

    The file is divided into fixed-sized chunks, which are hashed separately.
    The hash of each chunk is combined into a hash for the whole file.

    The hashing process may be interrupted by Control-C (SIGINT) or SIGTERM.
    When a signal is received, hashing continues until the end of the 
    current chunk, then the file position and current hex digest is saved
    to a file. The name of this file is formed by appending '.hash' to the 
    name of the file being hashed.

    Just re-run the program to resume hashing. The '.hash' file will be deleted 
    once hashing is completed.

    Written by PM 2Ring 2014.11.11
'''

import sys
import os
import hashlib
import signal

quit = False

blocksize = 1<<16   # 64kB
blocksperchunk = 1<<10

chunksize = blocksize * blocksperchunk

def handler(signum, frame):
    global quit
    print "\nGot signal %d, cleaning up." % signum
    quit = True


def do_hash(fname):
    hashname = fname + '.hash'
    if os.path.exists(hashname):
        with open(hashname, 'rt') as f:
            data = f.read().split()
        pos = int(data[0])
        current = data[1].decode('hex')
    else:
        pos = 0
        current = ''

    finished = False
    with open(fname, 'rb') as f:
        f.seek(pos)
        while not (quit or finished):
            full = hashlib.sha256(current)
            part = hashlib.sha256()
            for _ in xrange(blocksperchunk):
                block = f.read(blocksize)
                if block == '':
                    finished = True
                    break
                part.update(block)

            full.update(part.digest())
            current = full.digest()
            pos += chunksize
            print pos
            if finished or quit:
                break

    hexdigest = full.hexdigest()
    if quit:
        with open(hashname, 'wt') as f:
            f.write("%d %s\n" % (pos, hexdigest))
    elif os.path.exists(hashname):
        os.remove(hashname)    

    return (not quit), pos, hexdigest


def main():
    if len(sys.argv) != 2:
        print "Calculate resumable hash of a file."
        print "Usage:\npython %s filename\n" % sys.argv[0]
        exit(1)

    fname = sys.argv[1]

    signal.signal(signal.SIGINT, handler)
    signal.signal(signal.SIGTERM, handler)

    print do_hash(fname)


if __name__ == '__main__':
    main()