when people say a file has a checked md5 hash, what exactly does that mean?
Just to be clear, the article from your link mentions digital signatures, and has a section showing a figure with an MD5 value.
Digital signatures and MD5 hashes are different things.
MD5 is an algorithm which generates a cryptographic hash value. MD5, like other cryptographic hash functions, takes as input a sequence of bits and produces a fixed size output regardless of the size of the input. The sequence of bits can be a file. For simplicity, from now on I will just use file instead of sequence of bits.
When you want to check to see if you have the same file another person has, you can generate an MD5 hash of the file and compare to a MD5 hash the other person has created.
Warning: The following example is insecure and is just for illustration!
Alice sends Bob a file:
- Alice calculates a MD5 hash hash_alice for file_a
- Bob askes Alice to send him file_a
- Alice send file_a to Bob
- Bob receives file_a
- Bob calculates a MD5 hash hash_bob for file_a
If hash_bob is the same as hash_alice then the file Bob recieve is the same file that Alice sent. Bob has checked the MD5 hash to verify that he recieved the correct file.
Now lets assume Mallory is an attacker and wants to give Bob a virus. She has the ability to monitor exchanges and intercept files.
- Alice calculates a MD5 hash hash_alice for file_a
- Bob askes Alice to send him file_a
- Alice send file_a to Bob
- Mallory incercepts file_a from Alice
- Mallory copies her virus file file_v and renames it file_a
- Mallory send her virus file file_a to Bob
- Bob receives file_a
- Bob calculates a MD5 hash hash_bob for file_a
Now hash_bob should not be the same as hash_alice, and Bob should realize that someone has send the wrong file.
if the program has multiple files, how do we go about computing the single md5 hash for that program?
For each file in the program you calculate a hash value.
If I have: main.exe libabc.dll release.txt and iconabc.gif
I calculate hash_main.exe, hash_libabc.dll, hash_release.txt, and hash_iconabc.gif
Each hash value should be unique.
Intermediate section:
The problem with the first example is that it does not show how Bob gets hash_alice so he can compare it with hash_bob. If hash_alice is sent the same way as file_a an attacker would modify it the similar to how Mallory did in the second example.
There are two basic solutions to the problem: use a secure (or out of band channel) to send the hash, or have the hash signed by a trusted certificate (@nealmcb credit here). Out of band means using a different physical medium of transmission. One example of out of band would be to print out the hash value and send it via postal mail. Secure channel means using something like a Virtual Private Network (VPN) or IPSec.
The problem with the signed hash is that Bob needs Alice's certificate in order to verify the signature. If Alice sends the certificate to Bob the same way she sends the file, then the certificate could get intercepted just like the file (@nealmcb credit here).
Reflections on transmission:
If you think about the two solutions for a minute you may come up with a question.
If I have a secure channel to send the hash, why don't I use the same channel to send the file?
The reasons you would use the normal internet to send the file and a secure channel to send the hash:
- The secure channel is very slow (i.e. dialup) and the hash is short so it transfers quickly, but the file is large and would take too long.
- The secure channel is expensive. Either you get charged for time used or bytes transfered.
- The channel owner limits your use of the secure channel to only sending or receiving hashes.