Can anyone, please, explain to me how to transform a phrase like "I want to buy some milk" into MD5? I read Wikipedia article on MD5, but the explanation given there is beyond my comprehension:
"MD5 processes a variable-length message into a fixed-length output of 128 bits. The input message is broken up into chunks of 512-bit blocks (sixteen 32-bit little endian integers)"
"sixteen 32-bit little endian integers" is already hard for me. I checked the Wiki article on little endians and didn't understand a bit.
However, the examples of some phrases and their MD5 hashes in that Wiki article are very nice:
MD5("The quick brown fox jumps over the lazy dog") = 9e107d9d372bb6826bd81d3542a419d6
MD5("The quick brown fox jumps over the lazy dog.") = e4d909c290d0fb1ca068ffaddf22cbd0
Can anyone, please, explain to me how this MD5 algorithm works using some very simple example?
And also, perhaps you know some software or a code that would transform phrases into their MD5. If yes, please, let me know.
Md5 is a hash algorithm: It produces a signature of the input text such that changing any letter in the input will have significant, unpredictable impact on the signature.
For instance:
The md5 signature of the text 'This is a quite short text which looks quite normal' is '2bb1a5a5204aba95c886b3eb598c9d41'
The md5 signature of the same text with an added period, 'This is a quite short text which looks quite normal.' is '870df12558aae47b40bf738290ba8554'
As you see, there signature differs significantly. This property makes md5 suitable as a type of 'fingerprinting': Two books who only differ by one letter have completely different md5s. Futhermore, two md5s are almost never the same for any pair of different books: collisions are extremely rare.
There are numerous implementations of md5, including several online versions (here is one). If you want one in a specific language, please specify which.
Forget about the endians: it's just a way name for a way to encode information.
Let's follow the wikipedia MD5 article. You start with an input message. It can be arbitrarily long: MD5 hashes for 2GB ISO files are routinely created, just like hashes for strings a dozen characters long (e.g. for passwords).
The hash will be contained in registers
a
,b
,c
andd
. These registers are initialized with special values (h0-h3
).The algorithm breaks the input into 16 4-byte chunks ("sixteen 32-bit little-endian words") and applies specific logical operations (functions
F
,G
,H
andI
) on parts of the input and the current state of registersa
,b
,c
andd
. It does this 64 times for each set of 16 4-byte chunks.When all of the chunks are processed, what remains in
a
,b
,c
andd
is the final hash, the one you might get by invokingmd5sum testfile.txt
.Update:
If you just want to be able to calculate a hash, implementing it yourself makes no sense because it's been done and tested for probably every significant language out there:
Python:
SQL (MySQL):
Java:
etc.
MD5 is horribly broken and has been for years. Do not use for any purpose if you can possibly help it. In new applications, use a SHA-2 hash function such as SHA-256.