1

I've got an existing (personal) backup service which I'm rewriting from the ground up to be secure.

At present, I just store files (and diffs) in AWS S3 with no encryption. It works fine but I'd like to make sure my data can't be leaked.

For the sake of this discussion, I'm assuming the S3 data can be read by someone else. I want to be able to to backups without having to provide a key/password on startup.

My current plan is as follows:

Generate a public/private key pair. Store the private key somewhere safe (without writing to local disk). Store the public key locally.

On a per-file basis (or part of file):

  • Securely generate a symmetric key
  • Use the symmetric key to encrypt the file
  • Encrypt the symmetric key using the public key
  • Store the IV, encrypted symmetric key and encrypted file data in S3

Am I correct in assuming that even though all the data from the last point above can potentially be read by someone else, the fact that the private key is required to recover the symmetric key means this is secure?

Am I missing anything obvious?

Basic
  • 1,190
  • 1
  • 6
  • 13
  • So you need the private key from elsewhere to recover from backup, but you don't need it to make new backups? – cpast Apr 12 '15 at 23:40
  • Where/how is the IV generated? You can't just throw Crypto at a problem. What mode of operation? How large of keys, etc? – mikeazo Apr 12 '15 at 23:54
  • @cpast That's the plan. That way, the machine can back up happily on its own, with no input from me. When I want to restore the backup, I grab the private key (from my usb dongle) and I can access files – Basic Apr 12 '15 at 23:54
  • 2
    @mikeazo IV Generated randomly alongside the symmetric key. I haven't addressed mode of operation yet as it seemed pointless before having a rough idea what the architecture would be. I need to do more research but something AES256/CBC based would be my starting point. Re "Can't just throw crypto at a problem": Yes, I'm aware of that. That's why I'm attempting to plan in detail before I start. Re: Keysize, I don't know yet. Just small enough to not be prohibitively slow – Basic Apr 12 '15 at 23:56
  • 1
    Is there no existing software that meets your needs? Why reinvent the wheel? – mikeazo Apr 13 '15 at 00:04
  • @mikeazo Sure, Backblaze amongst others, but it's closed-source and relies on me trusting someone else actually does what they say they do. I could try and find something open source, then read the code end-to-end but then I'd never learn to do it myself. I'm well aware that trying to implement the encryption/decryption myself would be a fail in the making but if you're arguing that nobody should ever use cryptography in software they write themselves then I'm afraid we disagree. – Basic Apr 13 '15 at 00:08
  • Can I ask a question... I am new to security why not just encrypt the file using the public key? Why are you generating an additional one? – ojblass Apr 13 '15 at 00:16
  • @ojblass I'm also no pro, but from my understanding asymmetric encryption (pub/priv) is very expensive compared to symmetric and is best suited to short messages. Thus you use asymmetric to share a key that's then used for a chunk of data encrypted symmetrically. – Basic Apr 13 '15 at 00:18

3 Answers3

2

The basic design you propose is secure. Of course, the security of any working system also depends on the implementation.

However, using public key crypto for backups has little benefit compared to symmetric crypto. The usual arrangement for backups is to have a symmetric key. You store this in two places: your working machine, and in a secure, safe, offline backup. There's no real risk around having the key on your working machine; it is only used to encrypt data that is already present on that machine.

Symmetrically encrypted backups are supported by a lot of software, e.g. Duplicati. This can even do incremental encrypted backups, which I think would be impossible with the public key approach you propose.

paj28
  • 32,736
  • 8
  • 92
  • 130
  • You make a good point re: available locally. I'd intended to keep data in the backup even after (securely) deleting locally, but I didn't make that clear in the question. Re: Diffs, that's an even better point I hadn't considered. I store files as chunks to simplify uploads so I could swap a chunk out but adding a byte at the start would scupper that. In theory it would work if I stored byte ranges but then I'd need to be able to compare to the old version to determine boundaries. You've given me a lot to think about, thanks. – Basic Apr 13 '15 at 11:06
1

@mikeazo has a point: What you're describing makes sense; it's the standard way to encrypt a file with a public key. If I were you, I'd start looking at encrypted backup applications for your OS, and seeing if they're appropriate, before reinventing the wheel.

Separately, you now have a key storage problem: your backups are encrypted, but the private keys to recover them need to be stored somewhere... and backed up.

  • Every time I touch crypto, I _always_ seem to have a key storage problem... My saving grace here is that the private key should only be required very infrequently (if I'm lucky, never) and I'm not overly concerned with physical security so a couple of USB dongles should do the trick. Thanks for confirming my understanding. This hobby project has been kicking around in my head for a couple of years and is as much about learning how to do it as having the resulting service. That said, I do appreciate the advice and wouldn't do this for production code unless I had to. – Basic Apr 13 '15 at 10:34
  • 1
    Yup. Crypto doesn't actually absolve you of the secure storage problem, it just reduces its size from 'all of my data' to 'all of my keys'. – Justin King-Lacroix Apr 13 '15 at 10:40
  • Doing it yourself for the learning is a great idea; for a production environment, if you're using Linux or Mac OS X, look into "duplicity". – Justin King-Lacroix Apr 13 '15 at 10:40
  • Thanks, I'll bear it in mind. My prod. environment consists of physical appliances with 28 Centos VMs (on a 4U SuperMicro FatTwin). We crawl/index/analyse lots of data stored elsewhere on the network which means that backups are not that important for the vast majority of the data we hold (how rarely does that happen?). That said, the results of the analysis do need to be preserved which we currently handle by syncing multiple appliances. An offline backup would be nice when we start targeting smaller clients who don't want multiple appliances. I'll accept in a few days, pending other answers – Basic Apr 13 '15 at 10:50
  • Since you don't need the private keys to backup, only to restore, you could store them safely at your company's home site, and supply them to the client on-demand. – Justin King-Lacroix Apr 13 '15 at 10:54
  • Or (don't laugh, this is actually a reasonable suggestion) generate the private key, encode it in base64, print it out, and have the client store it in a bank vault. Hell, print out a few copies, and have the client store them along with their other 'hyper-secure-really-do-not-lose-this' documents. – Justin King-Lacroix Apr 13 '15 at 10:55
  • Backup in this question is purely for personal use, but re: key storage, I'm not laughing. Private keys to SSH onto client appliances are stored on usb _and_ printed out and left with the client to secure. We keep the passwords so neither can access the system without the other. If we use Duplicity or something similar, the same approach may well be the answer. Anyway, thanks for all your input on this, it's appreciated. – Basic Apr 13 '15 at 11:10
1

Am I correct in assuming that even though all the data from the last point above can potentially be read by someone else, the fact that the private key is required to recover the symmetric key means this is secure?

Depends on your definition of secure. If you want to ensure confidentiality and integrity, and you go with CBC as you suggest in the comments, then the answer is no. If you only care about confidentiality, then your approach is pretty good. That said, if you only care about confidentiality, I suggest you rethink what you care about.

Am I missing anything obvious?

Well, the description you give is very high level, and I understand why you started with that. The devil is in the details though. That is why using an open source project that meets many of your needs will be important. At the very least, you should look at GnuPG. It can encrypt files in exactly the manner you describe. It's design has been around for a long time and has been studied by cryptographers.

mikeazo
  • 2,827
  • 12
  • 29
  • Interesting point re: integrity, thanks. With backups even if I know the data is invalid, it doesn't help me get the original files back. Of course, knowing that I'm not restoring a malicious binary is very valuable! I'll have a look at GnuPG to see what approach they take. Can you clarify why CBC in particular is relevant to your "no"? Because a garbled block only impacts the blocks on either side? I'm using MD5 to detect changes to a file. I know it isn't considered secure but for change detction it seemed like an ok compromise between speed/collisions. Presumably I should sign the hash too? – Basic Apr 13 '15 at 14:08
  • @Basic it is not due to which blocks are "garbled" but more deals with the meaning of "garbled". Sure, if you know that ciphertext encrypts some well structured data, an attacker modifying the ciphertext is very very likely to no longer conform to the structure. And you could detect that. A good cryptographic primitive should work no matter what the plaintext is (and sometimes even if the attacker chooses the plaintext). So, if you are encrypting random data, how do you detect that it is garbled? – mikeazo Apr 13 '15 at 16:44
  • @Basic RE MD5, MD5 alone does not provide cryptographic integrity protections as there is no secret. Often for integrity, we use [HMAC](http://en.wikipedia.org/wiki/Hash-based_message_authentication_code) or [authenticated encryption](http://en.wikipedia.org/wiki/Authenticated_encryption), or digital signatures. – mikeazo Apr 13 '15 at 16:46
  • Sorry, I was unclear. I simply meant that I could hash the result of the decryption and compare to the hash I have already, but that just shifts the problem to verifying the hash is not modified, hence the question re "signing" the hash. "HMAC" would've been far more succinct if I'd remembered the term. Anyway, you've given me a ton of useful information and reading. Thanks for your time. – Basic Apr 13 '15 at 17:06