1

I'm working on a project to encrypt many files with a single password.

The steps I will employ to encrypt the files are:

  1. user will execute a command similar to tool --encrypt --recurse directories/to/recurse and-other-files.txt
  2. the user will be prompted for a password
  3. two 64 byte crypto random salts and a 16 byte crypto random IV will be generated
  4. no 2 files will ever use the same salts or IV
  5. each individual salt will be combined with the password to create to 2 separate argon2id keys
  6. one key will be 32 bytes long and is used for the AES-256 cipher block
  7. the other will be 64 bytes long and will be used as the key for a sha-512 hmac
  8. the resulting encrypted file will be written as 2ByteVersion:64ByteHMACSalt:64ByteCipherBlockSalt:16ByteIV:EncryptedData:64ByteHMACSignature

I believe this would result in a reasonably secure, set of encrypted files. My main concern though, is that because of the way that users will use this tool, there is a good chance that they will accidentally encrypt small, easily guessed files.

And since CTR mode doesn't require padding, anyone with access to the encrypted file will know the length of the plaintext file. It seems that CTR mode is considered secure for files, provided the IV is unique for each encryption run and the file is authenticated.

Is there a chance that the cipher key, HMAC key, or password could be derived through a known plaintext attack from enough small guessable files? Are there any other glaring flaws in my methodology that could leak data?

Raz Varren
  • 160
  • 5
  • 1
    Don't run Argon2 twice. The more time you force a brute-force cracker to use, the better. If it takes 0.5 seconds to derive the encryption key and 0.5 additional seconds to derive the MAC key, then the user will have to run Argon2 for 1 second. A password cracker needs to only check one of those keys, however. It's not necessary to brute force the other key. If you split up the time 50-50 (and that's the best case scenario because anyone will just **attack the weaker of the two hashes**) then you're halving the amount of work a cracker needs to do. – Future Security Oct 22 '19 at 18:30
  • 1
    Instead request 512 bits of output from one call to one of the Argon2 functions. You can double the time-cost parameter, thus doubling the amount of work a cracker needs to perform, *without* requiring any more work from a legitimate user. You can request as many bytes as you want from one call. – Future Security Oct 22 '19 at 18:35
  • Is it considered secure to use different parts of a single derived Argon2 key for two different crypto schemes? Specifically the cipher block and the hmac. It's considered [bad practice](https://security.stackexchange.com/questions/37880/why-cant-i-use-the-same-key-for-encryption-and-mac) to use the same key for different purposes but does that extend to using different parts of the same key? – Raz Varren Nov 09 '19 at 17:54
  • It is safe. Most symmetric algorithms can be modeled as having statistically uniform and independent output bits. – Future Security Nov 09 '19 at 18:31

1 Answers1

1

No, using different salt (IV) under the same key doesn't leak information. The problem with the CTR mode is the IV-reuse under the same key. Since CTR mode turns a block cipher into a stream cipher, re-using the IV with the same key will produce the same output stream that is similar to the reuse of One-Time-Pad (OTP). As in the OTP key is reuse, the information can be extracted with crib-dragging techniques.

There are better alternatives to what you are trying. There are Authenticated Encryption (AE) modes like AES-GCM and ChaCha20-Poly1305. AE mode ciphers can provide you Confidentiality, Integrity, and Authentication all-in-one. TLS version 1.3 has these algorithms in the standard.

With AE you don't need to provide two keys. If you are insisting to use HMAC, you can use a good Key Derivation Functions like Argon2id. By providing different salts you can derive two different keys.

Encryption, in general, doesn't protect the plaintext size ( if padding is applied the output size is always multiple of the block length in block-ciphers, which is not a case for CTR mode). The input size equal to the same output size plus IV and tag those sizes are known to the attacker. One way to mitigate, you can add some random characters at the end of the files. That must be uniquely removable like paddings. This can be helpful if you cluster the file into groups i.e make the size fixes length at some intervals.

Breaking the CTR mode with Known-Plaintext Attack (KPA) is equal to Breaking AES. Apart from the side-channel attacks, cache attack, power attacks, we believe that AES is KPA-secure. Since the attacker cannot access the key, they cannot find the password in this way.

The password, on the other hand, must be strong against the brute-force attacks. This may be the weakest point of your system.

If you consider only securing your files, there are already good alternatives for you; VeraCrypt. With VeraCrypt, you can create an encrypted volume and you can store your files in it. From your password to encryption/decryption of the volume are all handled for you. It is cross-platform, and you can access your files from Windows/Linux/Mac. You can find some third-party for your mobile devices, too.

kelalaka
  • 5,409
  • 4
  • 24
  • 47
  • Thanks, this is very helpful. The biggest reason why I didn't choose AES-GCM was because in the language I'm using ([go](https://golang.org/)), the plaintext/ciphertext message needs to fit into memory for [GCM](https://golang.org/pkg/crypto/cipher/#NewGCM) to work. Since I figured there could be files much larger than the available memory, I went with a streaming cipher and HMAC. – Raz Varren Oct 20 '19 at 17:24
  • Yes, They load all into memory. – kelalaka Oct 20 '19 at 17:28
  • @RazVarren The limitation you cite in Go was actually an _explicit design decision_ to discourage you from doing exactly what you're about to do. You should split large files into chunks of bounded size so that (a) you're not even _tempted_ to act on unauthenticated data, and (b) there's a limit to the amount of memory an adversary can waste in a denial of service attack. See the [discussion on crypto.SE](https://crypto.stackexchange.com/a/51439) and the [rationale by Adam Langley](https://github.com/golang/go/issues/17673#issuecomment-275732868), who was responsible for the decision in Go. – Squeamish Ossifrage Nov 10 '19 at 14:50
  • I should probably clarify how I intend to verify encrypted payloads. Unauthenticated data will not be acted on. First, the file size is checked to make sure it is at least large enough to contain the headers and signature. Next, every byte from the start of the file to the beginning of the signature will be fed into the HMAC. Afterwards, decryption will begin only if the signatures match. As far as memory that an adversary could waste, Go's io package seems to be very memory efficient for files of any size. – Raz Varren Nov 10 '19 at 19:23