Is ccrypt 'AES 256' file encryption fundamentally flawed

Question

The unix/linux utility ccrypt claims to be a more secure replacement for the old unix crypt. However it seems to me that it has a fundamental flaw.

When you encrypt a file using ccrypt it uses the password you provide to create a random 256-bit key to use with AES. To decrypt the file you provide the password again, ccrypt uses the password to get the AES key and decrypt the file.

OK, the file is securely encrypted using 256-bit AES but its security is wholly dependent on the key being secure - and the key isn't secure.

When decrypting ccrypt actually tells you if the password used to decrypt the key is correct and doesn't try to decrypt if you give the wrong key. Thus it would be trivial to brute-force the key as it tells you when you guessed right.

This problem applies to any file encryption which uses this approach, it's the security of the key that matters, the algorithm used for encoding the actual data is almost irrelevant.

Where messages and such are encrypted to protect them while in transit key based encryption makes sense but 'stationary' file encryption is different. Even the GPG file encryption where the key is kept separately doesn't help much, it's usually obvious that GPG has been used and the keys are kept in 'well known' places. Again it's the key security that matters and the file encryption is irrelevant.

OK, one could separate the key completely by putting it on a USB stick or something but for most file encryption this is a bit OTT and anyway losing the stick would mean losing one's data. Files are things that usually need to be kept long term, keys need backing up with the files.

I'm thinking that for simple, in place, file encryption the old DES based crypt is better. Brute forcing it is much more difficult.

Have I got this completely wrong or not? :-)

http://crypto.stackexchange.com/questions/24163/ccrypt-and-its-security — Stephane, Oct 08 '15 at 08:39
Why do you think that brute-forcing a 256-bit AES key is easier than brute-forcing 56-bits of DES? 2^256 possible keys is nearly impossible to brute-force if generated by a cryptographically secure source. — RoraΖ, Oct 08 '15 at 16:32
Because you're *NOT* brute-forcing the 256-bit AES key, you're brute-forcing the rather weak algorithm used to *create* the 256-bit AES key. That's the whole point, the 256-bit AES encryption is strong but you don't have to break it to decode the file. — Chris Green, Oct 08 '15 at 20:06
What block mode of operation does ccrypt use? CBC? (EDIT: looks like it uses CFB mode) — forest, Mar 03 '18 at 04:19

score 19 · Answer 1 · edited Mar 17 '17 at 13:14

Password-based encryption is inherently vulnerable to dictionary attacks. Indeed, when you have the encrypted file, you can always try to decrypt it (or parts of it) with a given password and see if the result makes sense. Thus, an attacker who could get his hands on the encrypted file (and we assume that this is possible, since otherwise there would be no point in encrypting), could "try passwords" on his own machines until a match is found. It shall be noted that the attacker does not need a 100% reliable test for the "result makes sense" part: if the attacker's test is sufficient to reject 99.9999% of wrong passwords, then he still reduced his list of passwords to double-check by a factor of one million.

That a tool such as ccrypt provides a test for rejecting wrong passwords does not change this fact. Sure, the attacker can then use the test as well, but it won't give him much of an additional advantage. The attacker could already work comfortably without that test, because real-life data has a lot of structure (most file formats have very recognizable headers) and if the attacker is interested in breaking the password, then the data must be valuable, hence "real-life".

There are two methods to make password-based encryption stronger; they are cumulative, and you should be using both. The first method is to use a strong password, which means a password with a lot of randomness in it (as usual, see this question for discussions on that subject). The second method is to convert the password into an encryption key using a good, properly configured password hashing function; the "proper configuration" there means that a salt will be used to prevent parallel attacks (when there are several encrypted files with distinct passwords, and the attacker would like to break one or several of them), and that the password hashing function is made deliberately expensive through many iterations to make dictionary attacks harder.

On the latter point, ccrypt appears to be defective. It hashes the password with what the FAQ describes as:

Ccrypt uses a hashing algorithm based on AES (i.e., on the Rijndael block cipher). It is not one of the standard hashing algorithms such as SHA1. Thus, for your application, you should be fine. From the viewpoint of security, it does not matter much which hashing algorithm is used, as long as it is collision-free.

which does not bode well. First it is, by the author's own word, a custom homemade function, which is never a good sign. Moreover, the author talks about the function needing to be "collision-free", which is completely irrelevant for password-based key derivation, and thus indicates a relatively poor grasp of the requirements of such things; this is not a good sign either, especially when combined with a homemade function.

Looking at the source code (that's the nice thing with opensource: you can see for yourself what the author did not see fit to document), the custom hash appears to work in the following way:

Set K and H to both be sequences of 32 bytes of value 0x00.
For each password character c (a single byte; we are using an ASCII-compatible encoding such as UTF-8 here), do:
- XOR every byte of K with c.
- Replace H with the encryption of H using Rijndael with key K and block size 256 bits.
The final value of H is the "hashed password".

This calls for the following comments:

The FAQ says "AES" but it is not AES. AES is a family of three functions, which is a subset of the family of functions known as Rijndael. AES always uses 128-bit blocks; here, Rijndael is used with 256-bit blocks, so it is not AES stricto sensu.
As a hash function, it sucks. The process basically boils down to repeatedly encrypting a single block, using each time one in 256 possible keys (the 32 bytes of each encryption key are always identical to each other). Since this is symmetric encryption, symmetric decryption also works, which allows a meet-in-the-middle attack. Basically, the output size is 256 bits, but the preimage resistance is at best 2¹²⁸, not 2²⁵⁶; while this is not an immediate concern for password hashing, it still demonstrates the fundamental flaw with custom schemes.
As a password hashing function, it stinks. Not only it is unsalted, but it is also extremely fast. An attacker with a basic PC will use the AES-NI instructions to compute a Rijndael encryption in a matter of a few dozens of clock cycles (the AES-NI instructions are meant to support AES specifically, but they can still be used to efficiently support 256-bit blocks; see section 2.1 of this paper). Moreover, since the password bytes are processed one-by-one, a lot of the computation cost can be shared between successive password trials: from hashed password 'foo', the hashes for 'fooA', 'fooB', 'fooC'... can be computed with a single Rijndael-256 invocation for each.

It can be estimated that a cheap, off-the-shelf PC would be able to hash at least 100 millions of potential passwords per second. Very few human-chosen passwords will resist such an onslaught for more than a few minutes at best. As far as password hashing is concerned, this is atrocious.

Summary: ccrypt is unduly weak against dictionary attacks, but not because it has a test for password correctness. It is weak because its password hashing procedure is a homemade construction that accumulates flaws making dictionary attacks extremely efficient. With a properly configured password hashing function (e.g. bcrypt), it could literally be made a million times stronger (the iteration count would be adjusted so that a PC could hash only 100 passwords per second, instead of 100 millions; it would still be highly usage because 1/100th of a second is not a large latency in human terms).

Thanks, you have confirmed much of what I was thinking, with much more in depth knowledge than I have. It's a common defect of just about all the 'encrypt a file with a simple command' utilities I can find at the moment. They rarely seem to care much about the security of the password hashing algorithm. I've asked about command-line scrypt in another thread. — Chris Green, Oct 08 '15 at 16:21
Re: your first paragraph doesn't the attacker need to know how the file was encrypted to attack it easily? E.g. if a file is encrypted directly (no intermediate key) with some sort of hashing algorithm don't you end up with what looks like random data and no clue as to how to unscramble it? — Chris Green, Oct 08 '15 at 16:26
Usually the attacker has some good knowledge of the _context_. After all, he wants to crack your files because he is interested in their contents, which means that he knows you. Moreover, tools leak traces of their existence everywhere (file extensions, headers, usage scripts...). It is definitely unsafe to assume that the attacker does not know exactly what tools you are using and how they work. — Thomas Pornin, Oct 08 '15 at 17:52
OK, that's why I was asking (elsewhere here) about hiding the tools used to do the encryption. Thanks. — Chris Green, Oct 08 '15 at 20:07
@ThomasPornin Perhaps you could update your answer to account for the ccrypt author's reply below? — forest, Mar 03 '18 at 03:21
Thomas - I have deleted that part of Peter's answer, but you will still be able to read it. — Rory Alsop, Mar 04 '18 at 13:59
Using salt means that file corruption that affects bytes storing salt makes the whole file unrecoverable. With saltless hash derivation corruption affects only the block with corrupted bytes (64 bytes for ccrypt). If the goal of the encryption is long-term archival storage, one may consider ability to partially restore from a corrupted file as a big plus of the encryption algorithm. — Igor Bukanov, Jun 06 '18 at 20:09
@IgorBukanov The same is also often true of the IV (depending on mode), which is mandatory. — forest, Jun 18 '18 at 09:11

score 0 · Answer 2 · edited Jun 18 '18 at 01:39

Original answer explaining that ccrypt does not use a salt, does not use true AES, and does not use a standard KDF: https://www.mathstat.dal.ca/~selinger/downloads/ccrypt-answer.txt

I'm the author of ccrypt. Sorry for responding so late to this thread, but I was only made aware of it recently. As always, questions about ccrypt can also be directed to me (and are likely to receive a faster response) in the ccrypt forums on SourceForge.

To address the original question: let me state unequivocally that ccrypt, along with every other password-based encryption program, is subject to brute-force attacks including dictionary attacks. As Thomas Pornin aptly put it, "Password-based encryption is inherently vulnerable to dictionary attacks". Nothing prevents an attacker from trying every possible password. This is not a flaw, but a basic fact of life, and one that every user of cryptography must understand. If you use a weak password such as "poodle", an attacker will be able to guess your password and decrypt your data. If you use a strong password such as "hVztmdz28fNemDZnxj5YLjXz", your data will remain secure for the foreseeable future, to the best of current knowledge.

I would like to clear up a few other points raised by the original poster and in the answers and comments.

In the original post, there seems to be a misunderstanding about what ccrypt is a replacement for. This is probably due to the fact that I haven't updated that part of the documentation in a long time. When I wrote (in the year 2001) that ccrypt is a replacement for the old unix crypt program, I was referring to the crypt(1) program, not the crypt(3) library function. Crypt(1) was a password-based file encryption program that has existed in Unix at least since 1979 (it was in Version 7: http://man.cat-v.org/unix_7th/1/crypt), and was well-known throughout the 1980s and 1990s. It used a notoriously weak algorithm based on the Enigma cipher, which had already been broken in World War II. On the other hand, crypt(3) is a library function that hashes passwords to the format used in /etc/password. As far as I know, there is nothing specifically wrong with crypt(3), and it is still in use today. Ccrypt(1) is a replacement for crypt(1), not for crypt(3). It does not make sense to ask whether ccrypt is "better" than crypt(3) since they do completely different things. Probably I should update that part of the documentation, given that thankfully nobody remembers the crypt(1) program today.

The original post also raises a question about the fact that ccrypt will inform the user when they have typed a wrong password, and whether this makes dictionary attacks faster. The answer is that this does not make dictionary attacks faster, except perhaps by a very small constant factor. Thomas already explained this well in his answer, but just to emphasize the point, consider this: suppose that the encrypted file is a JPEG file. Every JPEG file starts with the same 3 bytes, namely ff d8 ff. So there is already a very simple test to check the likely correctness of your key: just decrypt the first block and check whether the file starts with ff d8 ff. Most other file formats also contain this type of known plaintext, for example, every PDF file starts with "%PDF". It is also easy to check whether a file is a plain text file. So the fact that ccrypt provides the password matching feature does not make attacks any easier than they are anyway. As a matter of fact, ccrypt's check for a "matching" key is also not 100% accurate. There is a 1 in 4 billion chance that any random key will pass the test, even if it is not the correct key. The feature exists to prevent people from accidentally decrypting a file with the wrong key (and thereby overwriting all the data in it with garbage).

Summary: Like all password-based encryption software, ccrypt is subject to dictionary attacks. Users must be aware of this and choose strong passwords. However, it would be counterproductive to make insecure passwords only slightly less insecure by increasing the cost of these attacks by a constant factor as has been suggested. Instead, the only correct response it to use secure passwords in the first place. Much of the remaining analysis presented in Thomas Pornin's answer is flawed, and no actual weaknesses in ccrypt have been uncovered.

I'm sorry but your conclusions are showing that you do not have a solid grasp of cryptography. Salts and such _do_ improve security, as does a strong hashing algorithm. Not adding them makes it possible to do precomputation attacks (e.g. rainbow table attacks) that would otherwise be impossible. Even the NSA is not a computationally unbound adversary. Not adding salting and using a fast KDF because you don't know any better is one thing. _Intentionally_ doing it for the reasons you state is a little scary. — forest, Mar 03 '18 at 03:13
`If a hash function contains predictable collisions, it always decreases the cost of brute-force attacks, because there are fewer keys that need to be tested.` This is incorrect, as are the examples that follow. A function vulnerable to collisions is one where you can find two different messages such that _f(m1) = f(m2)_, compared to preimage where given _m_ you can compute _f(m) = f(m')_. `So there would be obvious collisions, as "password" and "PaSSwOrD" would hash to the same internal key.` **This is not what a collision attack is.** — forest, Mar 03 '18 at 03:16
Finally instead of just being critical, here's some advice: 1) Use a standard cipher. There's no reason to use Rijndael-256 as AES with the 128 bit blocksize is fine. 2) **Salt the damn thing!** It's simple and you can just concatenate the salt with the password to be hashed. 3) **Use a slow KDF** like bcrypt, scrypt, or PBKDF2. If you look at leaked databases, ones using unsalted hashes have a huge percentage cracked, whereas ones using salted bcrypt often have only a few dozen broken passwords. Thomas Pornin is completely correct. He is a professional cryptographer. Listen to him. — forest, Mar 03 '18 at 03:26
You misunderstand Thomas Pornin. Preimage attacks _are_ relevant to password hashing. If there is _f(m) = h_ and you only have _h_, then a first preimage would mean you can find _f(m') = h_, totally breaking the password. What he was saying is that 2^128 preimage resistance caused by the meet-in-the-middle attack you introduced to your hashing scheme is not bad enough to be of immediate concern, but preimage as a class of attacks _are_ a concern for hashing. You seem to misunderstand what a preimage is. Also, random IVs do not protect against rainbow tables... — forest, Mar 03 '18 at 03:48
If your encryption tool is not secure unless users pick a password like "hVztmdz28fNemDZnxj5YLjXz", then you shouldn't have let users pick their own password. Most users won't and can't memorize a 256-bit password, and users that do will use a password manager or write the password down somewhere. In that scenario, there's no point in allowing users to pick their own passwords, except to allow users to shoot their own foot. If you're going to allow users to pick their own passwords, you must expect the need for key strengthening, i.e. use a proper, deliberately slow KDF. — Lie Ryan, Mar 03 '18 at 04:35
Hi Peter, your answer post is not the place to discuss another answer post. Please do that in [chat] - I'll delete that section now. — Rory Alsop, Mar 04 '18 at 13:57
Peter, in the (now linked) answer, you state that you use a custom hash because `ccrypt has existed since before AES was standardized, and certainly long before any hash functions were standardized to use with AES`. Leaving aside the AES/Rijndael distinction, one need not "standardize" a hash to use with a cipher. The SHA-2 family (including SHA-256) was only published as a draft in Jan 2001, but RIPEMD-256 is from **1996**, and GOST was declassified in 1994. All produce 256-bit digests (there's many more if you accept other lengths) suitable for use as symmetric encryption keys. — CBHacking, Jun 18 '18 at 08:09
@forest As I understand it, the output of the password hash is never stored anywhere, so there's no value to look up in a rainbow table (and it would be pointless anyhow, as what you care about is the key, not the password that produced it). This is a difference between *key derivation functions* (which this is supposed to be), and *password hashing functions* (for authentication systems, which this is not). A typical PHF used to verify a password keeps the hashes secret only to prevent offline attacks. Here, the hash is the key; the password it derives from has value *only* by way of the KDF. — CBHacking, Jun 18 '18 at 08:19
@CBHacking You can create a rainbow table for a cipher [when there is known plaintext](https://crypto.stackexchange.com/q/18307/54184) (e.g. the header / magic value). You simply treat it as if the cipher itself is a hash, i.e. _H(K) = E_K(C)_ for hash function _H_, block cipher _E_, key _K_, and constant _C_. With that, you can build a rainbow table for _K_. — forest, Jun 18 '18 at 08:22
@forest Yes, but in that case the random IV *does* protect against rainbow tables, assuming it's of remotely appropriate length (just like a salt). — CBHacking, Jun 18 '18 at 08:48
@CBHacking For CBC mode maybe. I forget what mode `ccrypt` uses. — forest, Jun 18 '18 at 09:03

Is ccrypt 'AES 256' file encryption fundamentally flawed

2 Answers2

Linked