I need to get a base64-encoded MD5 hash of an object, where the object is an image stored as a file, fname.
I've tried this:
def get_md5(fname):
hash = hashlib.md5()
with open(fname) as f:
for chunk in iter(lambda: f.read(4096), ""):
hash.update(chunk)
return hash.hexdigest().encode('base64').strip()
However, I don't think this is right because it returns a string with too many characters. My understanding is that it needs to be 24 characters long. I get
NjJiM2RlOWMzOTYxYmM3MDI5Y2Q1NzdjOTQ5YWRlYTQ=
I've tried a few other similar ways as well, for example, one that does not do the chunk loop thing. They all return the same string.
(My later actions that need the base64-encoded MD5 hash fail, and I'm thinking this could be why.)
First, base64 encoding makes strings longer. (Example using IPython with Python 3):
With base64 encoding you get 8 bits of output for every 6 bits of input.
The base64 alphabet uses 64 (or 2**6) different symbols. Generally they include lower- and uppercase letters, the digits 0-9. This leaves two extra required symbols and a pading character. Often
+
and/
are used as symbols, but there are variations. Especially since/
is not allowed in UNIX or MS-Windows filenames.Second, using a hexadecimal representation doubles the length of a byte string; the hex representation of one byte can vary between 00 and FF. Example (again using IPython and Python 3):
If you are going to use base64 encoding anyway, it makes no sense to use
hexdigest()
.I was able to make it work by using digest() instead of hexdigest(). Then the last line becomes:
The result was then 24 characters long, and it was accepted by Google Cloud Storage transfer, which required a base64-encoded MD5 hash.