14

Suppose I have a file with records, and I have two options to encrypt it:

  • encrypt the file as a whole
  • encrypt each record separately and store them together.

Which way is generally preferable and why?

For example, I think the second approach is worse because being implemented naïvely it would disclose information about number and size of records. But what if I add some random data to the end of each record?

Thomas Pornin
  • 320,799
  • 57
  • 780
  • 949
Andrew T
  • 241
  • 1
  • 5

4 Answers4

11

Before encrypting, you must first define what kind of property you want to achieve, and what are the paths which a putative attacker may use.

Encryption is a tool for gaining confidentiality. Most security models also need some kind of controlled integrity check ("only passive" attackers are very rare). There are encryption modes which combine encryption (for confidentiality) and verified integrity, see EAX and GCM. Per record encryption implies the following:

  • The number and size of records leaks.
  • The structural integrity of the complete file is no longer guaranteed: an active attacker could remove some records, or change their order, without being detected.
  • There is a size overhead. Each individual encryption has a fixed size overhead, which depends on the encryption mode (this is needed for the integrity check and for an essential parameter called Initialization Vector). Having that overhead for each record requires more space that a single overhead for the whole file.
  • It is possible to verify the integrity of a single record without processing the whole file.
  • It is complex.

This last consequence should be viewed as a compelling reason not to use per-record encryption: complexity is the worst enemy of the security engineer. So unless you have an actual need for processing the file in partial chunks (i.e. it is too huge to be decrypted and checked in RAM as a whole), you should refrain from trying to implement per-record encryption.

Thomas Pornin
  • 320,799
  • 57
  • 780
  • 949
  • +1, I love it when you explain the trade-offs of crypto. – AviD Jul 18 '11 at 09:58
  • Or even replace with old records. Suppose I could get cash from the bank and then replace my account record with the previous one. – Ángel Aug 18 '14 at 14:53
8

A lot will depend on exactly what you want to do with the data, and what your requirements are around bandwidth etc.

As you say, data leakage will occur around number and size of files with your second option, however this may not be a problem, and file access may well be faster.

In saying that, however, most volume encryption solutions encrypt the entire volume and in situations where disk access is not the bottleneck, this could be a suitable solution for you.

My preference in environments where the security level requires files to be encrypted is to encrypt the entire volume as this helps to remove the possibility of accidentally saving files in the clear, as well as reducing data leakage.

Rory Alsop
  • 61,367
  • 12
  • 115
  • 320
  • Actually is all about pieces of data (records), not about files or volumes. I think IO speed is not a concern. I'm worried about whether it'll be easier for an attacker to decrypt parts of data vs the whole data. – Andrew T Jul 15 '11 at 15:13
  • 2
    @Andrew - apologies. For records within a file, I would always recommend encrypting the entire file. The action of trying to populate an unencrypted file with lots of encrypted records seems far too complicated. – Rory Alsop Jul 15 '11 at 15:22
2

The second method isn't naive because of the reason you mentioned as encryption will give encrypted text of the same size as long as the block size isn't breached. Basically, if the input is 2 bytes or 16 bytes, the output will be 128 bits in AES-128 assuming you will be padding your input to bring it up to block size. But this method can easily disclose the number of records.

The reason the second method is naive is because, it involves more I/O operation. But if you have the resources, there is no problem. Moreover, when the encryption is very strong like AES, the approach is hardly a concern. Only concern is the time that will be taken to individually encrypt 1000 fields when they can be accommodated in 50 lines in a single file and encrypted/decrypted much faster.

The choice is given to you. Both are strong with the second one being a little weak in revealing the number of records. Otherwise, only external parameters like time available to encrypt/decrypt count in this case.

Andrew Anderson
  • 249
  • 1
  • 6
1

Encryption functions (block ciphers) cannot take in an arbitrary lengthed input. So even if you encrypt the entire file, the encryption function itself is going to break it into blocks and encrypt the blocks individually. This is similar to encrypting individual records, except that the mode of operation will dictate if/how these blocks are mixed with each other.

By encrypting each record individually, you are doing something akin to the ECB mode of operation. The difference is that you have partitioned the data yourself. This preserves information about the structure and any attack against ECB will also apply to the approach of encrypting records individually (without random padding). For example, if two records are the same, you will be able to see that in the ciphertext.

Also by encrypting each record individually, you are not filling the blocks up completely which will be less efficient in terms of speed and storage.

So the short answer is: it is much better to encrypt the whole file and to use a good mode of operation to do it (CBC or XTS for example).

The only reason you would consider not encrypting the whole file is if you need to selectively decrypt records efficiently or replace records. Encrypting the whole file means to fetch a record, you need to decrypt the whole file up to the record you are interested in. And changing any part of an individual record means reencrypting the rest of the data. If this is a concern, there are standard ways to do sector-by-sector encryption.

PulpSpy
  • 2,204
  • 15
  • 19
  • Actually, CTR mode allows random access to any part of the file without encrypting/decrypting the rest. Your description is valid for [PCBC mode](http://en.wikipedia.org/wiki/Mode_of_operation#Propagating_cipher-block_chaining_.28PCBC.29). (Most authenticated encryption modes would be of that kind, too, I suppose.) – Paŭlo Ebermann Oct 06 '11 at 11:20