0

My application encrypts a file with AES, and the data is read, encrypted and written with a buffer. It's size is defined with a BUF_SIZE value, which is constant.

I will try to explain my question with an example.

E.g the file size is 1.73 GB, and the buffer is 16 KB. The application calculates (fsize % BUF_SIZE) and finds out that 14K of data will remain.

For now, It does as follows:

1) Reads this 14 KB of data to the buffer

2) Fills other 2 KB with random data

3) Encrypts and writes the whole buffer.

The problem is that after such an encryption even a 310-bytes plain text file becomes a 16 KB monster!


The idea is to change the algorithm to encrypt ONLY this 14 KB and write them to the resulting file.


When I was writing that algorithm, I somewhy considered the second way unacceptable; now I cannot remember the reason.

Is it safe to encrypt files like so?

I am mostly interested in whether doing it one way or the other as described above makes full key recovery attacks easier. I'm a student doing this in my spare time, not a professional.

EDIT 1: My application encrypts the header with AES/GCM-128, and the other data - with AES/CFB-256 mode. So, as far as I understand, there is no matter for CFB how much data left, right?

EDIT 2: Added this approach to my application. Thanks to everyone who helped! (^_^)

Ilya
  • 145
  • 1
  • 5
  • Safe against what? (What is your threat model?) – user May 13 '15 at 14:28
  • I am writing a file encrypting application, which is supposed to protect private user's data. So, it should be safe against decrypting by anyone else. – Ilya May 13 '15 at 14:33
  • Full key-recovery (total break) ciphertext-only attacks only? (Key recovery leading to plaintext recovery.) There are [many cryptanalytic attacks](https://en.wikipedia.org/wiki/Cryptanalysis), and even more attacks against a whole [cryptosystem](https://en.wikipedia.org/wiki/Cryptosystem) (including [random number generators](https://en.wikipedia.org/wiki/Random_number_generation) and [key derivation functions](https://en.wikipedia.org/wiki/Key_derivation_function)). What about [data remanence](https://en.wikipedia.org/wiki/Data_remanence)? And so on. "Safe" is an overly broad term. – user May 13 '15 at 14:45
  • I know. Currently I'm just asking whether **this very** approach leads to a vulnerability or not. – Ilya May 13 '15 at 14:49
  • 1
    There is not necessarily anything *wrong* with simply stating that the only threat your application attempts to protect from is total break ciphertext-only attacks. For some use cases, that's sufficient. It's just good to be specific about *what threat* you are looking to protect against, so that we can properly evaluate your scheme. – user May 13 '15 at 14:49
  • I'm just a 15 year old school student from Russia; I practice computer programming and cryptography on my own, so, I'm not a professional. I have neither superb cryptography skills nor excellent English level to explain which threats I want my application to resist. I try to make it as safe as possible for my level. Currently, this application is far from being safe, but it is already able to perform quite a good encryption. My priorities are usability and simplicity. – Ilya May 13 '15 at 15:22
  • [Here is my project](https://github.com/IlyaBizyaev/Entangle) – Ilya May 13 '15 at 15:23
  • What kind of (https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation)[encryption mode] are you using? AES-CBC, AES-CTR, AES-ECB, AES-OCF? That will affect the outcome, and the whole topic should provide you pointers to what you want. – Ángel May 13 '15 at 21:42
  • There is known plain text attack. If the attacker have a pretty good idea of what the message is about, they may be able infer what the message contains by looking at the encrypted size. Encrypting with larger block size makes this kind of attack more difficult. – Lie Ryan May 14 '15 at 23:58
  • I am going to make this attack more difficult by adding a random number of random bytes to the end of encrypted file. – Ilya May 15 '15 at 13:32

2 Answers2

2

According to NIST, The AES algorithm is capable of using cryptographic keys of 128, 192, and 256 bits to encrypt and decrypt data in block sizes of 128 bits (16 bytes). it means each block that goes as an input to the AES algorithm will be 16 bytes, so if your BUF_SIZE is 16 KB, when it wants to go to the AES, again it will be split to a blocks with 16 bytes size, and at the end if there are a data with less than 16 bytes it will be padded.(for example you want to encrypt 35 bytes,(2*16)+3, the remain 3 bytes will be padded with 13 padded bytes).

However your AES mode is important, for example if your mode is CFB, OFB or CRT, it does not require any padding and can be parallelized. CFB, OFB and CTR modes do not require any special measures to handle messages whose lengths are not multiples of the block size, since the modes work by XORing the plaintext with the output of the block cipher. The last partial block of plaintext is XORed with the first few bytes of the last keystream block, producing a final ciphertext block that is the same size as the final partial plaintext block. This characteristic of stream ciphers makes them suitable for applications that require the encrypted ciphertext data to be the same size as the original plaintext data.

But some modes (namely ECB and CBC) require that the final block be padded before encryption.

also you need a IV (Initial Vector) to produce distinct ciphertexts even if the same plaintext is encrypted multiple times.

so according to your example:

BUF_SIZE= 16 KB ==> 16*1024= 16384 bytes ===> 16384 mod 16 = 0 , it means each BUF_SIZE does not need to padding. and a remain one (14KB) ==> 14*1024=14,336 bytes ==> 14,336 mod 16 = 0 , so also the remain slice does not need to padding and it will be splited to 896 block as input for AES algorithm, that creates same size of input as output.

Or for example for 310 bytes, you do not need to pad it to 16KB, you only need to create 19 blocks with 16 bytes, and 6 bytes remains and that one is padded with 10 bytes. with together are 20 blocks.

310 = (19*16)+6.

Therefore, you must be concerned about the key strength and the size of that and manner of protecting it and also use standard AES not yours, because if only 1 byte remains it will be padded with enough padding bytes.

So in this situation there is no difference between encrypting 1 byte or 1024000 bytes, attacks are perpetual, the size of data is not important.

Ali
  • 2,694
  • 1
  • 14
  • 23
0

AES is a block cipher, meaning that the core operation processes only a block (of exactly 16 bytes with AES). When you want to "encrypt a file", you need to decide on how you split and shuffle your data, and what you actually send to the AES engine; this is called a mode of operation.

All modes of operations are not equal. Moreover, some modes also offer some integrity check, while others do not. In most situations where encryption is needed, integrity is also needed (because most passive attacks can become active attackers as well), so using an encryption mode that ensures integrity is, on a general basis, a good idea. You use GCM for the header; this is good. You don't use GCM for the rest of the file; this is less good. You should use GCM throughout.

Thinking about integrity clarifies things. Integrity is about detecting alterations. This means that, upon decryption, you cannot use the data until you have completed the integrity checks (until then, you don't know if it is correct or not). For instance, if you use a simple GCM-the-whole-file process, you get a very low space overhead (for a source file of n bytes, the GCM-encrypted version will have length n+16 bytes, including the "authentication tag" that incarnates the integrity check); however, this means that you must decrypt the whole file before beginning to use it. If you have, say, a 2 GB video file, then you might want to decrypt it "on the fly" (if only to avoid having to save the decrypted file somewhere, or keep it whole in RAM), i.e. use it before having completed decryption.

To support "streamed" access, you must then encrypt and decrypt things by blocks. Simply split the input file into blocks of k bytes for some value of k (all blocks need not have the same size) and encrypt them separately (with GCM, each encryption will yield k+16 bytes). To prevent an attacker from maliciously shuffling blocks, you would have to include some sort of sequencing. GCM is an authenticated encryption with associated data mode, meaning that it can integrity-protect both the data which is encrypted and some additional data. A secure file encryption format would then use the block sequence number as "associated data".

By using CFB, you get no integrity, so you have the corresponding vulnerabilities. Besides, CTR mode is arguably better than CFB, and GCM uses CTR mode internally, so GCM is definitely preferable.

Tom Leek
  • 168,808
  • 28
  • 337
  • 475
  • Yes, I am going to use GCM everywhere, but currently my application experiences some platform-dependent problems with Crypto++. – Ilya May 14 '15 at 13:06