7

Wikipedia states that md5's collision resistance is 2^18 (https://en.wikipedia.org/wiki/MD5).

I just found out that openssl enc uses md5 to hash the password and the salt. Let's assume that I want to use a 27 char random password (62 char alphabet). That would yield 62^27 = 10^48 = 2^160 possible passwords which will take an unfeasibly long time to crack.

Now, how does md5 change this assumption that brute forcing is not feasible? Will it take only 2^18 tries to find a suitable hash because finding collisions is so "easy"?

If yes, why is openssl enc still using md5?

randor
  • 145
  • 2
  • 8

2 Answers2

12

The OpenSSL apps need to be used carefully, they often have old and possibly insecure defaults, at least by contemporary standards.

In this case openssl enc defaults to using a digest of MD5 (apps/enc.c)

    if (md && (dgst=EVP_get_digestbyname(md)) == NULL)
            {
            BIO_printf(bio_err,"%s is an unsupported message digest type\n",md);
            goto end;
            }

    if (dgst == NULL)
            {
            dgst = EVP_md5();
            }

The -md "message digest" option can be used to set this explicitly, you can use any supported digest such as sha256:

openssl enc -md sha256 -e -aes256 -in infile.txt -out outfile.enc -pass pass:foo

Some of your assertions are correct, but they do not quite apply here, specifically since an attacker does not know the digest output a collision attack does not apply.

Finding a collision is only useful sometimes, e.g. when MD5 output is used as a signature or integrity check. A collision attack requires that you have a hash output to work with.

In this case MD5 is being used as a key-derivation function. For that, what is usually more interesting is pre-image resistance: given a digest you could work out an input (e.g. password) that would generate it. MD5 has good but imperfect pre-image resistance, ~123 bits instead of the intended 128, see RFC6151.

Given only an encrypted file, if the attacker's task is to recover the encrypted data, then he must discover the symmetric key (key+IV) used to encrypt. This key is also a digest he can compute from a pass-phrase which he can discover instead (a salt, if used, is available in the file).

The attacker does not know the password or the key (digest of salt/password), either one will do: which is cheaper to find? What might help an attacker is if the MD5 output is biased, but it's not, as far as I know.

(Strictly there may be two scenarios: recover the password in order to prove something, not necessarily access to the encrypted data; or only recover the key in order to access the encrypted data.)

The two obvious attacks are:

  1. brute force the entire (symmetric cipher) key space, if the KDF (hash/digest) output is biased then this will be smaller than it should be, which may help the attacker
  2. brute-force the password space, if the input was chosen by a human then this will be smaller than the key space (the salt will eliminate pre-computed attacks though, enc uses a random salt by default)

Ideally, we want a good KDF with no bias, an output at least as large as the expected input entropy, matched to the symmetric cipher keysize, and expensive so that attack #2 doesn't offer a short-cut.

Using a digest like MD5 with a short output (128-bit output) as a KDF means that the key used to perform the encryption is only 128 bits, and in this case not as long as your required pass-phrase (~160 bits, 62^26). Also consider the symmetric cipher key size that will be used, ideally these should be matched.

The biggest problem here is not the potentially problematic use of MD5, it's that the KDF that openssl enc uses is "cheap", specifically it only uses one round, making brute force attacks on passwords the best attack (apps/enc.c, line #569):

             EVP_BytesToKey(cipher,dgst,sptr,
                            (unsigned char *)str,
                            strlen(str),1,key,iv);

The sixth parameter to EVP_BytesToKey (hard-coded to 1) is the count, designed to slow down an attacker (or not, in this case). What should be used instead is a robust, expensive key-stretching function like bcrypt or scrypt.

mr.spuratic
  • 7,937
  • 25
  • 37
  • 1
    +1. See https://security.stackexchange.com/questions/29106/openssl-recover-key-and-iv-by-passphrase/242567#242567 for details on the key derivation method used in older versions of `openssl enc` or newer versions of `openssl enc` with the `-md md5` option. – mti2935 Dec 27 '20 at 20:39
1

Well, it depends on how the salt is used, if a different salt is used per password. That means you have to recreate a rainbow table per salt so you need to do 2^18 password tries per password.

If you also have some rate limiter and/or intrusion detection this is not a viable way to crack an offsite password system.

Since SSL is used in a Server/Client situation this means on a properly enforced system its not feasible to 'crack' a password through brute force.

So the protection comes from not being able to use 'standard' rainbow tables and IDS.

To why it still uses MD5, this is due to legacy and compatibility (not all systems can utilize SHA2 in all lengths yet, think embedded devices).

peterh
  • 2,938
  • 6
  • 25
  • 31
LvB
  • 8,217
  • 1
  • 26
  • 43
  • OK, to clarify, one can use "openssl aes-256-cbc -e -a < infile.txt > outfile.aes" to encrypt data locally. If an adversary gets access to the encrypted file he can brute force as much as he likes, i.e. there is no way to rate limit etc. as you suggest as it is all offline. – randor Jan 23 '15 at 12:57
  • 1
    The point of a rainbow table it to trade the cost of computation (ideally someone else's cost) with the ability to re-use. It makes no sense here (with a salt). – mr.spuratic Jan 23 '15 at 22:05