Do encrypted compression containers like ZIP and 7-Zip compress or encrypt first?

7

2

The discussion of "compress and then encrypt, or vice-versa" led me to ponder the following question: many compression containers, like ZIP, 7z, and rar support encrypting these containers. For example, when creating a 7z file in 7-Zip, the program lets you enter an encryption password.

For these file types, are the files compressed and then encrypted, as recommended in the aforementioned question, or the reverse? Or, is there some way that these can compress and encrypt the data at the same time?

When I create an encrypted 7z file, I can view the filenames inside of the encrypted archive, but I cannot view the contents of those files without entering the passphrase. How is this possible? As an aside, is there any way to encrypt a 7z or similar archive such that the file names and directory structure within are not visible without using the passphrase?

I would prefer answers with definitive sources/references, not just speculation. We can all make guesses about this, but if somebody can show me documentation proving that it works one way or another, that would be ideal.

nhinkle

Posted 2011-03-22T04:01:24.510

Reputation: 35 057

1from what i remember of a OS concepts video, nothing in the system is being processed 'at the same time' i think the whole entity, string, data structure would need to be complete before applying a hash function. like a hash applied to 123456789 would be vastly different from the same hash applied to 123 then 456 then 789 and then concatenated. and i would think the reason you can view filenames unencrypted would just be that they are pointing to the encrypted part but are not encrypted themselves. perhaps 7zip makes a new address for the name separate from the rest of the file. – fightermagethief – 2011-03-22T04:36:40.243

The answers to the linked question pretty much conclude that compressing, then encrypting, is the only way to go. – user1686 – 2011-03-22T05:52:39.110

Answers

10

I would assume that 7-Zip and other archiving tools compress before they encrypt, for the reasons stated in the linked blog post. But I was unable to find any documentation that confirms that, nor could I immediately ascertain it from looking at the 7-Zip source code.

However, I can explain why filenames aren't encrypted. As you might be aware, the 7z format contains a header with the file information and other metadata. 7-Zip will not encrypt this header unless you explicitly enable it. You can do this by checking the Encrypt file names box at the bottom of the Encryption segment of the archive creation screen on Windows, highlighted in red below.

7-Zip archive creation screen with encryption segment highlighted

On Linux and other Unix-like operating systems (and presumably the command line 7-Zip tool on Windows), you can enable header encryption by adding a -mhe=on switch to the 7z command.

Patches

Posted 2011-03-22T04:01:24.510

Reputation: 14 078

Thanks for the useful answer! I'm not sure how I missed the check box for encrypting file names before; that's good to know. I appreciate the straight-forward, helpful, and polite nature of your answer. :) – nhinkle – 2011-03-23T03:22:07.090

9

I would prefer answers with definitive sources/references, not just speculation.

Oh you can do even much better than that. You can try it for yourself and base your conclusion on logic and facts. There's really no need to speculate here.

All these programs do compress first then encrypt and that is a fact that you can easily verify by yourself.

Take compressible data, like a huge number of .txt text files (say ASCII text files).

  1. Only compress these .txt files and look at the resulting file size.

  2. Now compress and encrypt the .txt files using the aforementionned programmed and look at the file size.

  3. Now encrypt first the .txt files and then try to 'compress' the encrypted file and look at the file size.

What will this experiment show? 1 & 2 will have basically the same size while 3 shall have the same size as your non-compressed data.

Because one of the guarantee made by encryption algorithms is that encrypted data will look random (if it doesn't, your encryption algorithm is broken and that is a fact too).

And you can't compress randomness.

That's even better than references: it's the "try it and see for yourself".

Fact 1: good encryption algorithms produce seemingly random data

Fact 2: random data cannot be compressed

So it's obvious that if you got a file size smaller than the total of all the files' size then compression took place before encryption.

Also, it is totally obvious that if you "compress and encrypt" a set of compressible files and do not end up with a size gain, then your "compress and encrypt" sofware is broken beyond repair and can safely be thrown away as garbage written by clueless people ; )

That's the fun thing with facts: you cannot argue with facts and you cannot be wrong when stating facts.

P.S: Don't try that with already compressed files, like, say, a set of .png files, that wouldn't work

Weezy

Posted 2011-03-22T04:01:24.510

Reputation: 525

btw I answered this before just reading the blog you pointed to. The blog is using specifically, well, logic and facts to prove that you need, just as I proved aboved, that you have to compress first then encrypt. What more do you want than the blog you pointed to? Did you read it? Do you understand what it means when the blogger quotes someone saying that good encryption produces what looks like random data? Do you realize that you cannot compress randomness? You should come up with more specific questions about what you don't get because the blog does answer your question. – Weezy – 2011-03-22T08:51:03.080

2I like your test-it-out method; that seems like a logical way to figure it out. The info in the blog post does imply that it would make sense to do it in the way you described; indeed, I assumed it would. However, software often doesn't work the way that makes the most sense - there are plenty of examples of ill-thought-out software. I asked because I was curious if there was any proof one way or the other. I understand the blog post completely, and read it in its entirety; that doesn't preclude me from having further questions. – nhinkle – 2011-03-23T03:20:59.463

1

For these file types, are the files compressed and then encrypted, as recommended in the aforementioned question, or the reverse? Or, is there some way that these can compress and encrypt the data at the same time?

My first question is why, but this is something you'd want to hit the technical docs for (either source code, patents, or the like). The idea behind the zip software is that they solve the problem and you don't have to think about it.

When I create an encrypted 7z file, I can view the filenames inside of the encrypted archive, but cannot view the contents of those files without entering the passphrase. How is this possible?

The contents of the files are encrypted, but the directory (the listing of file names, the relative locations of the encrypted file data and the file attributes) is not.

As an aside, is there any way to encrypt a 7z or similar archive such that the file names and directory structure within are not visible without using the passphrase?

Sure. Use any other file encryption software. Truecrypt, OpenSSL's various tools, etc.

Slartibartfast

Posted 2011-03-22T04:01:24.510

Reputation: 6 899

Why? Because I'm curious. While I'm aware that I can use other software, I'm more interested in how to use the existing software to do this, but yes, those other solutions would work. – nhinkle – 2011-03-22T05:13:25.773