1

I want to archive some GB of sensitive data. It is to be stored on an external drive that also includes non sensitive data so i don't want to encrypt the whole drive. For that purpose i want to use 7zip and the 7z file format with AES-256 encryption and a long (16+ character) password.

Since most of the data to be encrypted is already compressed or a compression would not do much (for example videos) and disk space is not a problem i want to choose "Store" as compression level to speed up archive creation.

I don't know much about the technical side of encryption but from what i found compression does not influence encryption, so choosing a higher compression level would not have any influence (positive or negative) on file security.

My question is wether this is correct or does it indeed have an influence?

3 Answers3

1

Compression would affect the encryption if the attacker controls some plaintext of the encrypted data as mentioned in this post.

It may not have an influence for your use case, since the data is archived and not accessed by many people, but if it is sensitive data then spare the extra GB for more security.

Khalid
  • 140
  • 6
  • 1
    The attacker must be able to control a part of the compressed data, **and** be able to trigger as many compression as wanted while keeping the same unknown data. That's "easy" with HTTPS, but not so much when a human is tasked to compress. – A. Hersean Oct 20 '20 at 08:00
0

Your archive is considered data on rest. Data on rest is not applicable to compression attacks as in the TLS case; like in CRIME and BREACH

Your main problem is accessing the files. The 7Zip is not a security-based software and it can leave information about your files, see 7zip temp files the files around. Your actual solution is using a file as encrypting volume as Veracrypt does it for you with file-hosted container.

Advices;

  • Keep a backup of the volume
  • Use a good password as diceware based.
  • Use more than one password in the case of one is lost.
kelalaka
  • 5,409
  • 4
  • 24
  • 47
0

Short answer: compressing ahead of time can very slightly increase the security of an encrypted archive. If you want more security, use a longer password.



Longer answer: some information theory may be helpful. Formally, information is interchangeable with entropy. Purely random data has the highest concentration of information, while structured data like human-readable text has a lower concentration of information.

Compression is concentrating the entropy of the data by removing repetition. Random data is generally incompressible because it doesn't repeat precisely.

Block-based encryption is artificially increasing the entropy of data by sending it through repetitive operations which mix what entropy it already with that of the key. They then scatter this entropy across the output. The idea is for the output to look like random data.

Perhaps the most famous cryptanalytic attack was against the German Enigma machines in the Second World War. It was possible in part because there was repeated information in the input to the encryption process. Compressed input slightly reduces the potential for this type of attack. These attacks aren't generally feasible today, though.

Most file-based encryption for archival purposes is symmetric, and most symmetric cryptographic algorithms use fairly long keys. You mentioned AES-256, which uses 256 bits for the key. Remember how I said above that block encryption mixes the entropy of the data with the entropy of the key? If your key has low entropy, the end result is low entropy in the encrypted data.

Password strength is a complicated topic, but can be summarized as the number of possible symbols in a position raised to the power of the number of positions. For example:

  • 16 lowercase latin letters: 26^16, or 4.3e22 possibilities.
  • Six diceware words: 7776^6, or 2.2e23.
  • 16 characters on a US keyboard: (26+26+10+32)^16, or 94^16, or 3.7e31.
  • 128-bit key: 2^128, or 3.4e38.
  • 256-bit key: 2^256, or 1.2e77.

As you can see, 16 letters isn't enough to get close to the maximum entropy AES-256 can use. 16 US keyboard characters is getting close to good enough for AES-128, but still only using a third of a billionth of a billionth of a billionth of a billionth of a billionth of the key space AES-256 can use.

256 bits is 32 bytes, 64 random hex digits, or 40 random keyboard characters.

Using a longer password will get you more security than compression or the lack thereof could ever get or cost you.

And neither will protect you against an attacker who has knowledge of or control over a significant part of the input to the encryption. Knowledge of one byte? Not a threat. Knowledge of a gigabyte? Big threat. That level of knowledge or control can allow someone to get the key regardless of its length.

  • Almost. "That level of knowledge or control can allow someone to get the key regardless of its length." that is called "chosen plaintext attack" and won't surely get the key. – ThoriumBR Oct 21 '20 at 22:28