6

I'm using openssl enc -aes-256-cbc -a -salt for automated differential backups to Amazon Glacier. But I noticed that using this command increases the file size almost perfectly by 35%.
In my understanding, a block cipher shouldn't change file size this much, with my current knowledge I know it adds at most 16 bytes to the end to create the padding. But that doesn't account for 17MB+ on my backups.

What is causing this increase in size?

Log lines:

09:09:16 Created tarbal of 165 files of size            106M
09:09:50 Created /archief/2014-05-10.encrypted of size  143M

09:09:11 Created tarbal of 186 files of size            132M
09:09:52 Created /archief/2014-05-17.encrypted of size  179M
gnur
  • 163
  • 1
  • 5

1 Answers1

12

The main increase is the -a flag which means it base64 encodes your ciphertext.

From man enc:

NAME
   enc - symmetric cipher routines

SYNOPSIS
   openssl enc -ciphername [-in filename] [-out filename] [-pass arg] [-e] [-d] [-a] [-A] [-k password]
   [-kfile filename] [-K key] [-iv IV] [-p] [-P] [-bufsize number] [-nopad] [-debug]

   [...]

   -a  base64 process the data. This means that if encryption is taking place the data is base64 encoded
       after encryption. If decryption is set then the input data is base64 decoded before being
       decrypted.

Base64 encoding means that for every three bytes of binary data (a byte is an 8-bit number meaning it has value 0 to 28-1=255) you have are encoded in four bytes of 6-bit data (with value 0 to 26-1=63, though represented in printable ASCII symbols). Base64 is convenient as the symbols for the 64 values can be chosen to be printable ASCII characters (e.g., typically 0='A',1='B',...25='Z',26='a',...51='z',52='0',...,61='9',62='+',63='/' though the last two often are defined differently in different variants). Note three bytes 8*3 has 24 bits, as does four groups of base64 encoded numbers 6*4.

For example if your ciphertext was three bytes (in hexadecimal): f0 bb 5c (240, 187, 92) in binary the bits grouped into three bytes would be:

 11110000 10111011 01011100

in base64 it would be the same bits, except grouped into four groups of 6 bits:

 111100 001011 101101 011100

which map to the values 60, 11, 45, 28, which on a typical base64 table would map to the printable ASCII characters 8Ltc which will take four bytes on the disk (instead of the three bytes it would have taken without base64 encoding).

Thus base64 encoding should account for roughly a 33% file increase. It's slightly more than that as openssl also adds a newline characters every 64 characters of base64 encoded ascii (so the text wraps at 64 bytes). These two features together account for a general file size increase of (4/3 * 65/64 - 1) = 35.4%

There's also a bit of overhead from your scheme. Specifying -salt takes your plaintext password and concatenates a random eight byte salt to the message along with a header Salted__ specifying that a salt was used, and these will also be base64 encoded. (The purpose of the salt is to make it less cost-effective for an attacker to pre-compute rainbow tables for common passwords). If I encrypt a random file in your scheme (specifying the salt as DEADBEEFDEADBEEF using openssl enc -aes-256-cbc -a -salt -S DEADBEEFDEADBEEF the first row of my encrypted file was

U2FsdGVkX1/erb7v3q2+7ybJfdPaLlVzOp7lKpOljvNK8ONCrgFrQpaJHQ8EqO1X

which decodes to (using python):

>>> import base64
>>> base64.b64decode("U2FsdGVkX1/erb7v3q2+7ybJfdPaLlVzOp7lKpOljvNK8ONCrgFrQpaJHQ8EqO1X")
'Salted__\xde\xad\xbe\xef\xde\xad\xbe\xef&\xc9}\xd3\xda.Us:\x9e\xe5*\x93\xa5\x8e\xf3J\xf0\xe3B\xae\x01kB\x96\x89\x1d\x0f\x04\xa8\xedW'

So combining the base64 encoding, the linebreaks, the salt, the initialization vector (for CBC mode), and the padding (to be evenly divisible to be 128-bit blocks for AES), an overhead of ~35% seems perfectly reasonable.

EDIT: Actually, openssl doesn't store an initialization vector when deriving a key from a password with a salt. From man enc: "When a password is being specified using one of the other options, the IV is generated from this password.". Using this and doing a couple test files, the file sizes match up perfectly. The salt 8 bytes plus Salted__ adds 16 bytes to the file. The file is padded to be a multiple of 16 bytes (adding at most 16 bytes). If you don't base64 encode the file size match up perfectly, and then you can get a file that exactly matches the base64 version if you then apply base64 --wrap=64.

dr jimbob
  • 38,768
  • 8
  • 92
  • 161
  • Nitpick: you don't need to *specify* `-salt`, it's been the default since at latest 0.9.6 in 2000 (I don't have archives before that). You only need to refrain from specifying `-nosalt`. Also it wasn't the question but remember the PBE used by `enc` is essentially PBKDF1 with ONE iteration which adds no strength, so the **password must have sufficient entropy** to resist dictionary attack, about 80-100 bits for the near future at least. (Other PBE operations in OpenSSL are better.) – dave_thompson_085 Dec 22 '15 at 02:19
  • @dave_thompson_085 Nitpick to your nitpick, read the question - I didn't bring up the `-salt` flag from no where, the question asker used it in his command (sure I could have said it's the default, but the flag was explicitly specified). I agree about the entropy part (though in reality 70 bits is probably fine too); the electricity cost to break a [70 bit passphrase is about $10 million plus ~50,000 GPU years](http://security.stackexchange.com/a/13016/2568). When you can use a [$5 wrench](https://xkcd.com/538/) or covertly install a keylogger or the info isn't worth millions, then its safe. – dr jimbob Dec 22 '15 at 21:41