7

ok I was just reading this site: http://www.zdnet.com/blog/bott/stay-safe-online-5-secrets-every-pc-and-mac-owner-should-know/3542?pg=4&tag=mantle_skin;content

and thought of something I always wanted to ask but never did.

when people say a file has a checked md5 hash, what exactly does that mean? Did they seriously dissolve the program into bits and do a hash on the bits?

Chris Dale
  • 16,119
  • 10
  • 56
  • 97
Pacerier
  • 3,253
  • 6
  • 34
  • 61
  • As Rory notes, this is closely related to (perhaps even a duplicate of) this question, which has more background: [Does hashing a file from an unsigned website give a false sense of security? - IT Security](http://security.stackexchange.com/questions/1687/does-hashing-a-file-from-an-unsigned-website-give-a-false-sense-of-security) – nealmcb Jul 11 '11 at 20:41
  • it's inherently 2 different questions altogether – Pacerier Jul 12 '11 at 22:09

4 Answers4

9

The idea of the md5 hash is that it is a hash of the entire file - which will be different if any bit is changed. Not sure what you mean by 'dissolve it into bits' because that is all a file is - a series of bits.

When you download a file, you can get the md5 hash of it on your own machine, and comparing that to the md5 provided online you can reassure yourself you have the correct, unmodified file.

Have a look at this question on the value of a hash of a file.

Rory Alsop
  • 61,367
  • 12
  • 115
  • 320
  • The link to that question provides alot of good information aswell :) +1! – Chris Dale Jul 11 '11 at 17:28
  • well i mean do they calculate the hash on every single bit (wouldn't that be quite insane if the program is very big) or do they calculate the hash on only certain parts of the program? – Pacerier Jul 12 '11 at 21:58
  • 1
    @Pacerier - The entire program, every single bit, otherwise it would be pointless. It is very quick. – Rory Alsop Jul 13 '11 at 00:14
6

When people say that they have verified that the MD5 hash of the file is in fact the same that the host of the file is saying the hash should be. If they are the same you know that nothing has messed up the file in between.

How this is done in practice is just run a tool like md5sum (wikipedia) on the file you downloaded and compare it against what the host is saying the md5 sum should be. Usually file providers will share the md5 sum on their page, easy accessible from where u downloaded it.

This increases security knowing that no one has tampered the file over the network and that the repository where the file is stored has not been tampered with. However the sum is of no use if the website showing you the md5 has been compromised.

Chris Dale
  • 16,119
  • 10
  • 56
  • 97
3

when people say a file has a checked md5 hash, what exactly does that mean?

Just to be clear, the article from your link mentions digital signatures, and has a section showing a figure with an MD5 value.

Digital signatures and MD5 hashes are different things.

MD5 is an algorithm which generates a cryptographic hash value. MD5, like other cryptographic hash functions, takes as input a sequence of bits and produces a fixed size output regardless of the size of the input. The sequence of bits can be a file. For simplicity, from now on I will just use file instead of sequence of bits.

When you want to check to see if you have the same file another person has, you can generate an MD5 hash of the file and compare to a MD5 hash the other person has created.

Warning: The following example is insecure and is just for illustration!

Alice sends Bob a file:

  • Alice calculates a MD5 hash hash_alice for file_a
  • Bob askes Alice to send him file_a
  • Alice send file_a to Bob
  • Bob receives file_a
  • Bob calculates a MD5 hash hash_bob for file_a

If hash_bob is the same as hash_alice then the file Bob recieve is the same file that Alice sent. Bob has checked the MD5 hash to verify that he recieved the correct file.

Now lets assume Mallory is an attacker and wants to give Bob a virus. She has the ability to monitor exchanges and intercept files.

  • Alice calculates a MD5 hash hash_alice for file_a
  • Bob askes Alice to send him file_a
  • Alice send file_a to Bob
  • Mallory incercepts file_a from Alice
  • Mallory copies her virus file file_v and renames it file_a
  • Mallory send her virus file file_a to Bob
  • Bob receives file_a
  • Bob calculates a MD5 hash hash_bob for file_a

Now hash_bob should not be the same as hash_alice, and Bob should realize that someone has send the wrong file.

if the program has multiple files, how do we go about computing the single md5 hash for that program?

For each file in the program you calculate a hash value.

If I have: main.exe libabc.dll release.txt and iconabc.gif

I calculate hash_main.exe, hash_libabc.dll, hash_release.txt, and hash_iconabc.gif

Each hash value should be unique.

Intermediate section:

The problem with the first example is that it does not show how Bob gets hash_alice so he can compare it with hash_bob. If hash_alice is sent the same way as file_a an attacker would modify it the similar to how Mallory did in the second example.

There are two basic solutions to the problem: use a secure (or out of band channel) to send the hash, or have the hash signed by a trusted certificate (@nealmcb credit here). Out of band means using a different physical medium of transmission. One example of out of band would be to print out the hash value and send it via postal mail. Secure channel means using something like a Virtual Private Network (VPN) or IPSec.

The problem with the signed hash is that Bob needs Alice's certificate in order to verify the signature. If Alice sends the certificate to Bob the same way she sends the file, then the certificate could get intercepted just like the file (@nealmcb credit here).

Reflections on transmission:

If you think about the two solutions for a minute you may come up with a question.

If I have a secure channel to send the hash, why don't I use the same channel to send the file?

The reasons you would use the normal internet to send the file and a secure channel to send the hash:

  • The secure channel is very slow (i.e. dialup) and the hash is short so it transfers quickly, but the file is large and would take too long.
  • The secure channel is expensive. Either you get charged for time used or bytes transfered.
  • The channel owner limits your use of the secure channel to only sending or receiving hashes.
this.josh
  • 8,843
  • 2
  • 29
  • 51
  • but if the program has multiple files, how do we go about computing the single md5 hash for that program? – Pacerier Jul 12 '11 at 22:11
  • @pacerier - the hash is generally computed on whatever file contains the installation package for the program, and that is typically a single file. It is easy to calculate a hash even for a huge file. – nealmcb Jul 12 '11 at 22:20
  • @this.josh - Good point about the difference between a signature and a hash. I think its important to further clarify that a raw md5 hash doesn't help in the situation you provide, since Mallory can presumably also give Bob a hash that matches the file she gives Bob. Without asymmetric (public key) crypto, you need an out-of-band method to get the hash to Bob. And even with it, you need an out-of-band method at least once to get e.g. a CA's key to Bob. – nealmcb Jul 12 '11 at 22:23
  • @nealmcb quite right. I was trying to keep it simple as the OP seem to indicate confusion. I suppose I should at least put in a warning. Notice I didn't show how Bob got the hash to compare with, that was intentional. – this.josh Jul 12 '11 at 22:53
1

If I understand your question, yes it is possible to check a file with the MD5 (or any hashing algorithm).

In fact each file processed by the hashing algorithm is bits and it does not depends of the type of input data. Other thing, the hash can have any length of input, but outputs a fixed size value.

M'vy
  • 13,033
  • 3
  • 47
  • 69