40

My concern is the disposal of a replaced disk from a private RAID5 disk array.

I have had to replace a disk from my personal RAID5 disk-array. It had started developing errors, so out it went.

But now, I have this disk lying on my desk and that got me wondering... The data on the array was never encrypted. I'm concerned that turning it in at the recycle-station could be a security-risk.
Is it possible that some mischievous individual would be able to retrieve personal data (photo's, files etc.) from the disk? Or is the fact that it was part of a RAID5 array sufficient for the data to have been scrambled beyond recognition?

Mausy5043
  • 511
  • 4
  • 6
  • 42
    anecdotally I have recovered lots of jpeg embedded thumbnails from single raid5 drives in the past – PlasmaHH Jan 08 '18 at 16:52
  • 3
    If it's a dead drive and magnetic, you could put it in a degausser to wipe the data, it's pretty quick but finding one might be hard. I know eliminating the data wasn't part of the question, but I figured I'd mention it – DeadChex Jan 08 '18 at 17:42
  • 5
    If its developing errors it is worthless. I'd spend a couple of minutes with a club hammer and a cold chisel through the side to shatter or deform the platters then turn it in at the recycling station. – RedGrittyBrick Jan 08 '18 at 19:37
  • 13
    best way to make sure data cannot be retrieved: https://i.stack.imgur.com/McuAt.jpg – SeanC Jan 08 '18 at 20:33
  • 1
    A specialized computer recycling facility (we have at least one here in Seattle - and conveniently, it is down the block from the city "transfer station") will have a device that punches a hole right though the thing - fun to watch and well worth the $5/drive they charge, IMO. – davidbak Jan 09 '18 at 04:49
  • Of course there are [more interesting ways](https://www.youtube.com/watch?v=-bpX8YvNg6Y) to destroy a hard disk ... – Dubu Jan 09 '18 at 08:53
  • 5
    If you want to dispose of the drive anyway, physical destruction seems the best solution. We already have a question about that :-) : [How do you destroy an old hard drive?](https://security.stackexchange.com/questions/11313/how-do-you-destroy-an-old-hard-drive). – sleske Jan 09 '18 at 10:39
  • 2
    "...and now, PFY, I'm gonna use my Hard Drive Sensitive Information Disposal Tool to make sure no one will ever recover it" "That's a hammer. And a very big one, at that." "Yes. Yes it is" – xDaizu Jan 09 '18 at 12:04
  • You might consider setting up encryption either above or below the RAID array, for future safety. – user Jan 09 '18 at 13:31
  • @SeanC [this is the only guaranteed method](https://upload.wikimedia.org/wikipedia/commons/7/79/Operation_Upshot-Knothole_-_Badger_001.jpg) –  Jan 09 '18 at 20:40
  • 2
    It's not RAID's job to scramble data. RAID is not a security method. – Harper - Reinstate Monica Jan 10 '18 at 23:05
  • I had to deal with something like this once. I ended up taking the hard-disk apart and showing it to my parents, since they never saw something like that before and were curious. So I killed 2 birds with one stone: satisfied my parents' curiosity + making the drive completely unusable. – Radu Murzea Jan 11 '18 at 14:04
  • 1
    If at all possible, encrypt your array. Imagine the data on it being stolen by a burglar. Now compare that with how low overhead encryption is and how it solves problems like disposing of dead drives. – xorsyst Jan 11 '18 at 16:00
  • @xorsyst post-encrypting a RAID array is no easy feat. But I'll certainly think about it. – Mausy5043 Jan 11 '18 at 20:17

6 Answers6

60

Raid 5 stripes the data across the disks but the blocks used for striping are typically pretty big. At the very least they will bewhole sectors but normally they will much larger than that. For example madm defaults to half-megabyte chunks. Even one sector is big enough that you are likely to find recongisable chunks of text and with typical chunk sizes it is quite likely entire recognisable files will be present on the individual drives from the array.

Peter Green
  • 4,918
  • 1
  • 21
  • 26
  • 12
    Since whole sectors (512 bytes or 4 kilobytes) to megabytes is a pretty big range it might be worth a mention that mdadm defaults to 512 kilobyte chunks. – AndrolGenhald Jan 08 '18 at 16:19
  • I'm guessing RAID6 would be worse as it has 2-disk redundancy on a 4-disk system, where RAID5 only has 1. – Mausy5043 Jan 08 '18 at 17:04
  • 8
    @Mausy5043, quite unrelated. The redundancy is parity blocks, which are not very useful by themselves. And block/stripe size is unrelated to the RAID level. – jcaron Jan 08 '18 at 17:21
  • 7
    @J..., the techniques aren't "rather advanced". Any but the most primitive "undelete" software has a mode that will ignore the filesystem structure and scan the disk directly for things that look like files. I've used it to pull JPEG thumbnails out of a virtual machine's memory dump; a single disk from a RAID array will be no problem. – Mark Jan 08 '18 at 21:02
  • 1
    @J You may need to be careful if you have a bean recipe on your drive. https://blondiesbearista.files.wordpress.com/2013/06/img_2840.jpg – さりげない告白 Jan 09 '18 at 09:17
  • 14
    @J..., in the interests of science, I pointed a copy of [Foremost](http://foremost.sourceforge.net/) at a hard drive removed from a RAID-6 array. The amount of recognizable data it found is, frankly, scary, and some of it surprisingly old (an ad for GTE Internet Services? A custom startup screen for Windows 95?) – Mark Jan 09 '18 at 10:50
  • 1
    @Mark can't argue with science...that could probably be an answer. – J... Jan 09 '18 at 11:42
  • @AndrolGenhald `Since whole sectors (512 bytes or 4 kilobytes) to megabytes is a pretty big range it might be worth a mention that mdadm defaults to 512 kilobyte chunks` Not saying that you are wrong, but 512KB is not in the 512B-4KB range. Did you mean it defaults to 512B? (Honest question, I have no idea) – xDaizu Jan 09 '18 at 12:09
  • 2
    @xDaizu The range starts at "sectors", which is typically 512b to 4Kb, and ends at "megabytes". – Taemyr Jan 09 '18 at 12:38
  • 1
    @Mark : I had the same experience with https://www.cgsecurity.org/wiki/PhotoRec on old, half-broken HDDs or SD cards. It's impressive how many, old pictures can still be found. – Eric Duminil Jan 10 '18 at 11:07
35

In the interests of actually testing this, I pointed a copy of Foremost at a disk that was formerly part of a RAID-6 array (made available thanks to Seagate). The array had a chunk size of 512KB, so any file of 512KB or less is theoretically present intact. The data on the array is from nearly 25 years of computer use, including disk images of every computer I've owned.

The amount of data that I recovered was, frankly, scary. Word documents containing high-school homework assignments. Data files from games I'd uninstalled decades ago. DLL files from a hundred different versions of WINE. Images attached to unread Usenet posts. Ten thousand cached web pages. Adding a custom extraction rule found three SSL private keys and an SSH key.

Another thing to note is that you don't always need to extract the entire file to get compromising information. For example, the first 512k of a PDF can give you the table of contents, the first 512k of a BMP can give you a caption (BMP stores its image data upside-down), and the first 512k of a JPEG can give you a thumbnail. MPEG and MP3 files are designed to be streamable, so even a chunk from the middle of one can give someone useful data.

How scrambled is data on a RAID 5 disk? Not scrambled enough.

Mark
  • 34,390
  • 9
  • 85
  • 134
  • 2
    Thanks for trying this, and also, I'm amazed that you still have a 25 yo disk still in use! – JPhi1618 Jan 11 '18 at 16:37
  • @JPhi1618, I do have a couple 25-year-old disks in use, but this isn't one of them. It's just got (partial) images of those disks. – Mark Jan 11 '18 at 19:12
15

Sounds like people may be confusing drive sector size (typically 512B to 4KB) with RAID 5 stripe size (typically 16KB to 128KB, sometimes larger). The RAID stripe size is the logical writeable size for the array, so each part of the stripe on each drive will contain that much data. If an entire file fits into the stripe size, it will likely all be visible as a contiguous block on the remove drive.

  • 11
    For example, an RSA private key should be <=4kB and has easily identifiable text - you could trivially find one with just `strings` and `grep`. That's the kind of data you could be leaking. – Bob Jan 09 '18 at 01:27
  • 4
    Stripe sizes of 1MB aren't uncommon. – I say Reinstate Monica Jan 09 '18 at 01:45
  • And also, even if the entire file doesn't fit into the stripe size, partial files of these sizes are just as much a "security-risk" in the sense the questioner is asking about :-) It would be a mistake to get hung up on file size, for example to think, "all my video files are much larger than the stripe size and therefore they'll be scrambled". – Steve Jessop Jan 10 '18 at 09:59
8

A single member of a RAID 5 array will consist of plain blocks and parity blocks, for example 75% plain and 25% parity for a 4 member array. The plain blocks can be read in plain view; there is no scrambling of these blocks and you do not need to refer to other members to make sense of it. These blocks are typically 16KB to 512KB in size though with RAID-5 this is usually 128KB or below to minimise write amplification. There is plenty of scope to read sensitive data that appears in such plain blocks.

Each parity block contains data that is generated from three plain blocks on other drives, in such a way that if any of those three other drives (in a four-member array) is lost the information can be recovered by applying an algorithm to the parity block and the blocks from the other remaining two drives. The data in a parity block makes no sense and could not be recovered on its own, unless you could guess the content of two of the three other blocks it combines with - which may in some cases be easy if two of the three blocks were empty (zeroes) or contain predictable data. Thus while it isn't cryptographically secure the information in a parity block is generally useless without two of the three other blocks it is generated from.

RAID 4 had a similar design to RAID 5 except that all the parity blocks were stored on one drive, so if you only had that drive, you would have no easily recoverable data. RAID 5 modified this to distribute the parity blocks evenly between members, meaning that any drive on its own will contain a lot of plain blocks from which you might recover data.

thomasrutter
  • 1,465
  • 11
  • 16
4

Or is the fact that it was part of a RAID5 array sufficient for the data to have been scrambled beyond recognition?

Rule of thumb would be to always secure erase magnetic media before giving it away under such circumstances (recycling, donation, etc.). Don't make any assumptions about the underlying data, whether it was encrypted, whether it was RAID0 or RAID 5, etc.

Even if contemporary wisdom is that something is secure, experience has shown us that such beliefs can be invalidated in the future (e.g. look at the number of years the Spectre/Meltdown security issues existed undetected, until researchers discovered how that could misuse processors' behaviour).

Phylyp
  • 157
  • 3
2

In the data centers that I work in we take data security very seriously. If the drive was part of an array that was not encrypted yes there is still enough data that can be retrieved from the drive. I would honestly recommend either scrubbing the drive if it is still functional or if it is not functional I recommend degaussing the drive with some sort of controlled magnetic device.

Terrance
  • 121
  • 3