4

My application encrypts a file, and the result is written to a different one. How do I safely remove the original one (C++)?

Update. Wow! The problem is more complicated than I thought of it...

1) My application's purpose is to encrypt files from user's hard drive. I may not require all the data to be piped, as I actually do not know which source it can come from. ANY files - not just browser's downloads or result of any kind of processing. So, piping is not for my case.

2) When I was writing my question, my main concern was whether shredding (or overwriting, as I called it before) does destroy the data or not. Now I see I was right, and the data is far from being sure to be destroyed.

3) Does shredding really damages hard drives?

So, does shredding still remain the best choice in my case or not?

John Deters
  • 33,650
  • 3
  • 57
  • 110
Ilya
  • 145
  • 1
  • 5
  • In your case, damaging disks probably isn't a huge deal since it sounds like you're only shredding a few files here and there. Occasional shredding is fine, but if your program is running multiple times a second, it'll definitely add to disk wear in the long run. It's mostly a concern on solid-state and flash drives than on magnetic ones. I can't find exact numbers on how fast SSD / flash wears out, looks like about 100,000 write cycles is the high-end, which sounds like a lot, but not when you're doing heavy writes for years. – Mike Ounsworth May 12 '15 at 13:22

3 Answers3

2

The most secure way to do this is to not have the unencrypted file on persistent storage in the first place. Pipe the unencrypted data directly to your encryption program directly from whatever program that generated/downloaded the data.

You'll also need to take care that you use the OS API to mark your memory area as non-swappable (e.g. with mlock on Linux). This ensures that the program wouldn't be paged out to disk by the virtual memory manager. Depending on your requirements, you may also need to take care to discard any sensitive data before the computer goes into suspend or hibernation.

Note that shredding a file may not actually overwrite the unencrypted data on some storage devices like SSD due to write balancing.

Lie Ryan
  • 31,089
  • 6
  • 68
  • 93
  • +1 Even on non-SSDs the hard drive firmware can remap "defective blocks", leaving the original data intact and recoverable. There's even suspicion that some manufacturers, under order from their governments, do this automatically for data that matches certain patterns (like private keys). I'm getting this from [`man wipe`](http://linux.die.net/man/1/wipe), which I'm aware is out of date, but the skepticism still remains. – Mike Ounsworth May 12 '15 at 01:33
  • It's a simple question guys, don't do over engineering. The part with SSD is true. – Sacx May 12 '15 at 06:14
2

I applaud @LieRyan's answer of never putting sensitive information on a hard drive in plaintext in the first place. Writing straight to an encrypted file, or better using full-disk encryption are far more reliable than shredding.


In addition, I want to add a few notes about modern hard drives being clever:

1. The idea of overwriting the data 16 times is wildly outdated. For magnetic drives manufactured after 2006, just zeroing it out once is sufficient (wikipedia). The problem was that in the 70's there was unused space between write bands where magnetic fields could "leak out" and be recovered. Now-a-days the bands are so tight together that nothing can be recovered from in between.

2. Wear Leveling: As pointed out by @LieRyan, SSD drives, Flash drives, etc do all sorts of complicated wear leveling to spread the writes around the disk (wikipedia). If you perform two sequential writes to the same file, you are guaranteed that they will not be to the same physical location.

3. Hard drive buffers (caches): All modern hard drives contain some cache memory used to buffer frequently accessed files (wikipedia). This means (A) there is the possibility that copies of your sensitive data will persist in the cache after the write has been completed. An attacker who has enough physical access to recover deleted data, can likely also recover the cache if they act quickly. (B) Let's say you do write 16 random patterns - flushing stdout each time -- these operations will all be done in the HDD's cache and only the final result actually written to magnetic disk, but because of (1) above, that's actually ok.

4. Hybrid drives: These fancy modern drives place small amounts SSD memory inside traditional magnetic drives to act as larger caches. Here you get the worst of both worlds as far as shredding is concerned: there are > 2 copies of your data due to wear leveling, AND it doesn't go away when you turn off the power.

The bottom line is that shredding wears out your drives faster (esp. SSD or Flash), but doesn't always delete your data. Shredding can certainly be done (many corporations and governments do), but it takes more than software; you have to be very careful about which hardware you buy and how you configure your operating system. If this is something that you're really worried about, then the better approach is to treat your hard drive as adversarial and not write sensitive things to it in plaintext, use full-disk encryption.

Mike Ounsworth
  • 57,707
  • 21
  • 150
  • 207
1

"Shred" the file. Are several way to do that for different operating systems, but the most simple one is to open the file and write random values over the real data, several times, and after that you can delete it.

Take a quick look here, is not necessary to create a uniform pattern, but is just an example: http://www.cplusplus.com/forum/beginner/23237/

And don't forget to flush after every rewrite cycle.

Update:

Here are more details if you want to take a look: Writing file shredder, the selected answer is pretty good.

Sacx
  • 684
  • 5
  • 12
  • Wow, that's approximately what I have done myself! Thank you! (^_^) – Ilya May 11 '15 at 15:53
  • Don't shred, just encrypt your whole hard drive. Shredding doesn't remove data _nearly_ as reliably as you might like to believe. @LieRyan's answer below is a much better answer. – Mike Ounsworth May 12 '15 at 01:36