What is the risk of data being recovered from SAN storage after logical deletion?

Question

The specific scenario involves Linux servers on VMware hosts with SAN storage. If files containing sensitive data are deleted by Linux, can they be recovered by a low level tool similar to what might be done with local storage? I think there are a number of scenarios that probably have different answers:

"Shortly" after deletion. The Linux instance has not been rebooted since the deletion and no explicit relocation of the VMware workload has occurred. The rest of the environnemnt has been stable during the intervening time
"Long" after deletion. The Linux instance has not been rebooted since the deletion and no explicit relocation of the VMware workload has occurred. Normal operations have occured around the instance that might have included relocation of other workloads and reallocation of SAN storage for other workloads.
An OS level restart has been performed, such as to force re-initialization of an updated service.
A full OS shutdown was performed and some time later the image was restarted. [What if the image is moved to a different host before restart?]
Workload was shifted to new VMware host without OS shutdown.

There are a few other cases I can imagine, but you get the idea. So, can data be recovered from within the OS in various scenarios? How about from the virtual host or SAN management console? From the media? How difficult are these vectors to exploit? What protection is there against recovery of deleted data in a SAN based virtual environment?

Remember that the snapshots you create will have remnants of the recoverable files as well. — IceMage, Oct 26 '15 at 14:55

score 2 · Accepted Answer · answered Oct 29 '15 at 05:58

Each layer of indirection makes this more difficult to answer.

NetApp SANs for example have the explicit policy to always do Copy-On-Write. This means that data is NEVER updated in place. Nothing is ever overwritten directly. Instead a new RAID stripe is written, this new stripe's data is mapped to the old stripe's address and the old stripe's data is marked as available for overwriting. This is completely unknown to the VM running on top of it. If I recall correctly the stripes were something like 4k. So if your private key was inside that stripe and you thought the Linux "shred" command actually overwrote it on disk, then no, this is not what happened down on the hardware level.

And there is no real telling exactly when that old stripe data once marked as reusable will actually really be reused. So depending on free space percentage on the SAN and throughput you may give some probability for overwrite. But only probabilities.

And every layer makes this harder to predict:

What about tiered storage?
What about the file system on the Linux VM itself? Will it actually release that space for overwrite? Will it forward the SATA "trim" command to the storage layer at all?

So in closing: so good way to tell. If you worry about remnants of deleted data consider full-disk-encryption.

Very good write up - I was wondering how that would translate to the physical data storage portion of that. Basically, it depends. So I understand it like this: For the life of the VM, it may be as simple as attempting to read from the deleted portion of the VMDK, but that doesn't mean after the data no longer lives there its unrecoverable. Perhaps another option would be to use a secure delete function within the VM for files such as this. — IceMage, Nov 02 '15 at 15:51

What is the risk of data being recovered from SAN storage after logical deletion?

1 Answers1