The OpenSSL apps need to be used carefully, they often have old and possibly insecure defaults, at least by contemporary standards.
In this case openssl enc
defaults to using a digest of MD5 (apps/enc.c
)
if (md && (dgst=EVP_get_digestbyname(md)) == NULL)
{
BIO_printf(bio_err,"%s is an unsupported message digest type\n",md);
goto end;
}
if (dgst == NULL)
{
dgst = EVP_md5();
}
The -md
"message digest" option can be used to set this explicitly, you can use any supported digest such as sha256:
openssl enc -md sha256 -e -aes256 -in infile.txt -out outfile.enc -pass pass:foo
Some of your assertions are correct, but they do not quite apply here, specifically since an attacker does not know the digest output a collision attack does not apply.
Finding a collision is only useful sometimes, e.g. when MD5 output is used as a signature or integrity check. A collision attack requires that you have a hash output to work with.
In this case MD5 is being used as a key-derivation function. For that, what is usually more interesting is pre-image resistance: given a digest you could work out an input (e.g. password) that would generate it. MD5 has good but imperfect pre-image resistance, ~123 bits instead of the intended 128, see RFC6151.
Given only an encrypted file, if the attacker's task is to recover the encrypted data, then he must discover the symmetric key (key+IV) used to encrypt. This key is also a digest he can compute from a pass-phrase which he can discover instead (a salt, if used, is available in the file).
The attacker does not know the password or the key (digest of salt/password), either one will do: which is cheaper to find? What might help an attacker is if the MD5 output is biased, but it's not, as far as I know.
(Strictly there may be two scenarios: recover the password in order to prove something, not necessarily access to the encrypted data; or only recover the key in order to access the encrypted data.)
The two obvious attacks are:
- brute force the entire (symmetric cipher) key space, if the KDF (hash/digest) output is biased then this will be smaller than it should be, which may help the attacker
- brute-force the password space, if the input was chosen by a human then this will be smaller than the key space (the salt will eliminate pre-computed attacks though,
enc
uses a random salt by default)
Ideally, we want a good KDF with no bias, an output at least as large as the expected input entropy, matched to the symmetric cipher keysize, and expensive so that attack #2 doesn't offer a short-cut.
Using a digest like MD5 with a short output (128-bit output) as a KDF means that the key used to perform the encryption is only 128 bits, and in this case not as long as your required pass-phrase (~160 bits, 62^26). Also consider the symmetric cipher key size that will be used, ideally these should be matched.
The biggest problem here is not the potentially problematic use of MD5, it's that the KDF that openssl enc
uses is "cheap", specifically it only uses one round, making brute force attacks on passwords the best attack (apps/enc.c
, line #569):
EVP_BytesToKey(cipher,dgst,sptr,
(unsigned char *)str,
strlen(str),1,key,iv);
The sixth parameter to EVP_BytesToKey
(hard-coded to 1) is the count, designed to slow down an attacker (or not, in this case). What should be used instead is a robust, expensive key-stretching function like bcrypt or scrypt.