How to transform phrases and words into MD5 hash?

2019-04-02 08:21发布

问题:

Can anyone, please, explain to me how to transform a phrase like "I want to buy some milk" into MD5? I read Wikipedia article on MD5, but the explanation given there is beyond my comprehension:

"MD5 processes a variable-length message into a fixed-length output of 128 bits. The input message is broken up into chunks of 512-bit blocks (sixteen 32-bit little endian integers)"

"sixteen 32-bit little endian integers" is already hard for me. I checked the Wiki article on little endians and didn't understand a bit.

However, the examples of some phrases and their MD5 hashes in that Wiki article are very nice:

MD5("The quick brown fox jumps over the lazy dog") = 9e107d9d372bb6826bd81d3542a419d6

MD5("The quick brown fox jumps over the lazy dog.") = e4d909c290d0fb1ca068ffaddf22cbd0

Can anyone, please, explain to me how this MD5 algorithm works using some very simple example?

And also, perhaps you know some software or a code that would transform phrases into their MD5. If yes, please, let me know.

回答1:

Forget about the endians: it's just a way name for a way to encode information.

Let's follow the wikipedia MD5 article. You start with an input message. It can be arbitrarily long: MD5 hashes for 2GB ISO files are routinely created, just like hashes for strings a dozen characters long (e.g. for passwords).

The hash will be contained in registers a , b, c and d. These registers are initialized with special values (h0-h3).

The algorithm breaks the input into 16 4-byte chunks ("sixteen 32-bit little-endian words") and applies specific logical operations (functions F, G, H and I) on parts of the input and the current state of registers a , b, c and d. It does this 64 times for each set of 16 4-byte chunks.

When all of the chunks are processed, what remains in a , b, c and d is the final hash, the one you might get by invoking md5sum testfile.txt.

Update:

If you just want to be able to calculate a hash, implementing it yourself makes no sense because it's been done and tested for probably every significant language out there:

Python:

import md5
md5.new("Nobody inspects the spammish repetition").digest()

SQL (MySQL):

SELECT MD5('Nobody inspects the spammish repetition')

Java:

String s="Nobody inspects the spammish repetition";
MessageDigest m=MessageDigest.getInstance("MD5");
m.update(s.getBytes(),0,s.length());
System.out.println(new BigInteger(1,m.digest()).toString(16));

etc.



回答2:

Md5 is a hash algorithm: It produces a signature of the input text such that changing any letter in the input will have significant, unpredictable impact on the signature.

For instance:

The md5 signature of the text 'This is a quite short text which looks quite normal' is '2bb1a5a5204aba95c886b3eb598c9d41'

The md5 signature of the same text with an added period, 'This is a quite short text which looks quite normal.' is '870df12558aae47b40bf738290ba8554'

As you see, there signature differs significantly. This property makes md5 suitable as a type of 'fingerprinting': Two books who only differ by one letter have completely different md5s. Futhermore, two md5s are almost never the same for any pair of different books: collisions are extremely rare.

There are numerous implementations of md5, including several online versions (here is one). If you want one in a specific language, please specify which.



回答3:

MD5 is horribly broken and has been for years. Do not use for any purpose if you can possibly help it. In new applications, use a SHA-2 hash function such as SHA-256.