I decided to write a short article about MD5 in general and cover the main aspects for others to review and learn from, also to solidify the knowledge in my memory.
MD5 is a one-way cryptographic hash function and is short for Message Digest algorithm 5. It was invented in 1991 by Professor Ronald Rivest and its purpose was to replace the old MD4 standard.
MD5 takes in a value and performs a series of binary operations on the input producing a 128 bit or (16 byte) hash value. Usually the output is returned in text format as a 32 digit hexadecimal number.
So MD5 output contains a combination of 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F
Even though the odds of getting a duplicate hash are very slim, it is not impossible.
The reason behind it is because, as we have mentioned MD5 has 32 hexadecimal digits, and every digit can be from 0-F giving it 16 possibilities on each position leaving us with 16^32 or 3.402823669209387e+38 which is a big enough number to make the chances of getting a duplicate very slim.
The question might arise as to how a hash function is different from a cryptographic hash function? That is a very good question and one worth asking.
In very simple terms a cryptographic hash function is designed to guarantee a number of security properties, such as making it very difficult finding collisions, making outputs random (as in they are generated in such a way that even one bit of discrepancy between two inputs will make the output completely different) and much more.
One more property of cryptographic hashing is that it takes in any type of input of any length and returns a random, yet fixed-sized hash value.
On the other hand a non cryptographic hash function or just a hash function has a lot weaker guarantee for avoiding collisions. This, however, makes hash functions a lot faster as opposed to cryptographic hash functions since there is less logic going on during its creation.
Now MD5 is great, however there is a huge downside when trying to use it as a security feature since it is no longer secure. In 2004 it was demonstrated that MD5 is no longer collision resistant. Use of other algorithms has been recommended such as SHA-1, though this also has been found to have a security vulnerability. As a side note, SHA 2 family of hash functions are considered to be safe as of now.
So this leaves us with a question as to what is MD5 used for currently?
As we have found out it is not used for applications such as SLL certificates or digital signatures since there is no longer a guarantee of the security properties inherent to cryptographic hash functions.
Currently, MD5 is used mostly for checking the integrity of a downloaded package – since the generated MD5 hash will pretty much be identical if the package is exactly the same as the authenticated source.
If for some reason the package gets corrupted because of network related issues or anything else for that matter then the MD5 would not match and will be completely different (even if one bit is different in the input the output will be dramatically affected).