How does one implement chunked CBC encryption safely; is this implementation flawed?

Question

UPDATE: Upon further research, I discovered a library that appears to meet my needs, especially with regard to the chunked aspect. Rather than "roll my own", I would be better served to use this well-established library:

https://github.com/defuse/php-encryption

I have a need to encrypt large files (up to 2GB), while at rest, using an amount of memory that is not a function of the input file size.

Accordingly, I intend to employ a "chunked" approach whereby I read n bytes of the input file, encrypt it, append it to a file pointer, and repeat until the end of the input file is reached. To decrypt, the process would be reversed, in essence.

I have found what looks to be a fairly reasonable attempt at this:

https://www.php.net/manual/en/function.openssl-encrypt.php#120141

But I have several questions/concerns about the author's code:

Why does the author hash the key and then take only the first 16 characters from the hash?

$key = substr(sha1($key, true), 0, 16);

I thought that perhaps there is a limit to the key length, but passing a key whose lengh is much greater than 16 characters does not seem to cause an encryption/decryption failure, in which case this seems entirely pointless, if not detrimental to the viability of this function.

Doesn't this alteration weaken the key considerably by reducing it to a mere 16 characters in the [a-f0-9] range?

Why does the author Use the first 16 bytes of the ciphertext as the next initialization vector inside the while loop?

$iv = substr($ciphertext, 0, 16);

From what I gather, this is strictly necessary for the chunked approach to work because the IV for each chunk must be known while decrypting, and in this implementation, it is obtained from the previous chunk.

My understanding is that where CBC ciphers are concerned, the best-practice is for every call to openssl_encrypt() to use a maximally random IV. To that end, would it be better to call openssl_random_pseudo_bytes(16) within each iteration, as the author does initially (outside the loop), and prepend the freshly-generated IV to the chunk? If so, it seems like that would affect the block size/handling such that I would need to make other changes.

In any case, is the author's approach to generating the IV for each chunk sane? Or should I rework this aspect?

How problematic is it that this approach does not implement HMAC?

How important is HMAC, given that these files are to be uploaded to a server, encrypted at rest, and then downloaded from the same server? The files are encrypted while in transit via TLS over HTTPS, so I'm not concerned about an adversary compromising their integrity while in transit. The server on which the files reside at rest is "trusted" in that I control it, but, of course, that does not mean it couldn't be compromised in some capacity. What are the risks in foregoing HMAC, given my use-case, and is it feasible to implement using a chunked approach?

Thanks in advance for any feedback!

How does one implement chunked CBC encryption safely; is this implementation flawed?

0 Answers0