Python, get base64-encoded MD5 hash of an image ob

2019-04-27 11:30发布

问题:

I need to get a base64-encoded MD5 hash of an object, where the object is an image stored as a file, fname.

I've tried this:

def get_md5(fname):
    hash = hashlib.md5()
    with open(fname) as f:
        for chunk in iter(lambda: f.read(4096), ""):
            hash.update(chunk)
    return hash.hexdigest().encode('base64').strip()

However, I don't think this is right because it returns a string with too many characters. My understanding is that it needs to be 24 characters long. I get

NjJiM2RlOWMzOTYxYmM3MDI5Y2Q1NzdjOTQ5YWRlYTQ=

I've tried a few other similar ways as well, for example, one that does not do the chunk loop thing. They all return the same string.

(My later actions that need the base64-encoded MD5 hash fail, and I'm thinking this could be why.)

回答1:

I was able to make it work by using digest() instead of hexdigest(). Then the last line becomes:

return hash.digest().encode('base64').strip()

The result was then 24 characters long, and it was accepted by Google Cloud Storage transfer, which required a base64-encoded MD5 hash.



回答2:

First, base64 encoding makes strings longer. (Example using IPython with Python 3):

In [1]: s = '123456789012345678901234'

In [2]: len(s)
Out[2]: 24

In [3]: import base64

In [4]: e = base64.b64encode(s.encode('utf8'))

In [5]: len(e)
Out[5]: 32

In [6]: e
Out[6]: b'MTIzNDU2Nzg5MDEyMzQ1Njc4OTAxMjM0'

With base64 encoding you get 8 bits of output for every 6 bits of input.

In [7]: 32/24
Out[7]: 1.333

In [8]: 8/6
Out[8]: 1.333

The base64 alphabet uses 64 (or 2**6) different symbols. Generally they include lower- and uppercase letters, the digits 0-9. This leaves two extra required symbols and a pading character. Often + and / are used as symbols, but there are variations. Especially since / is not allowed in UNIX or MS-Windows filenames.

Second, using a hexadecimal representation doubles the length of a byte string; the hex representation of one byte can vary between 00 and FF. Example (again using IPython and Python 3):

In [1]: import hashlib

In [2]: s = b'this is a simple test'

In [3]: len(hashlib.md5(s).digest())
Out[3]: 16

In [4]: len(hashlib.md5(s).hexdigest())
Out[4]: 32

If you are going to use base64 encoding anyway, it makes no sense to use hexdigest().



标签: python hash md5