Which RAID will be better for my use

1

I work with large datasets(1TB-2TB) of genome sequencing. Recently we have lost some our important data on a Dell Workstation. We are planning to store and backup our data regularly on daily-basis. I heard about RAID but not sure about which RAID-system(0,1,5,10..) best suits our purposes.

aravind ramesh

Posted 2012-10-29T11:51:14.047

Reputation: 121

Answers

2

If you plan to use more than 2 hard drives than RAID 5 would be most suitable for your purposes. Using n hard drives RAID 5 provides the capacity of n-1 drives for use while allowing one disk to fail.

For instance, if you use 5 hard disks with 2 TB capacity each you can effectively use 2*(5-1) = 8 TB in total while providing tolerance on a single failing disk.

In contrast to this, you could also use RAID 1 or RAID 10/0+1 which basically means that you are mirroring your data. Using n = 2 disks you could effectively use the storage of 1 disk, using the other for mirroring (this is actually RAID 1). With n ≥ 4 (and n even) you can combine mirroring with striping to effectively use n/2 of the disks for storage.

It depends on the scenario if RAID 5 or a composite RAID 10/0+1 is more suitable.


Note: Even if you are using whatever RAID type – please be sure to backup your data! A RAID is never replacing a backup!

Just imagine a file which has been accidentally deleted/overwritten from your RAID system – this file will be lost forever since it will also be deleted/overwritten on the mirroring/replicating disks.

speakr

Posted 2012-10-29T11:51:14.047

Reputation: 3 379

RAID1 does not provide n/2 but 1/n storage space. The data is the same on every disk, there is no "half of the space for mirroring". If you have 5 disk of 2TB, the space you'll have will still be 2TB (but you'll have a high fault tolerance). – m4573r – 2012-10-29T12:49:50.930

@m4573r Yes, using RAID 1 with more than 2 disks and no striping you would effectively mirror the same data over all disks. I modified my answer accordingly. – speakr – 2012-10-29T13:20:05.683

1

It looks like you need RAID 1 : data is written identically to two drives.

If the datasets you store is very very large (you stock them on more than 1 disk), you can concider using RAID 5 (data is copied on differents disk with a special checksum that let you recover all your datas if 1 of the disks fails)

Source: http://en.wikipedia.org/wiki/RAID#Standard_levels

NB: Raid 0 improve performances, but not data security, Raid 10 is good when you uses many disks (4 at minimum)

IggY

Posted 2012-10-29T11:51:14.047

Reputation: 205

Data sizes are around 1Tb-2TB – aravind ramesh – 2012-10-29T12:01:43.690

See the answer below with mroe details. if you want to use 3 disks or more : use raid 5, if you plan just using 1disk (+1 mirror disk) of 2TB use Raid 1 – IggY – 2012-10-29T12:14:18.893

1The 0 in RAID 0 refers to how much of you're data you're going to get back if a disk fails ;-) – Joey – 2012-10-29T13:34:31.827

1

i would say RAID5 going by size, co$t, speed, data availability(redundancy) type of use etc...

to Repeat: RAID is not a backup; please always have at least 1 VERIFIED backup..

OS Array: Non-Parity RAID (0,1,10) favored for OS so that overhead of parity calculations on WRITEs for CONSTANT winRegistry and virtual mem/paging file don't bog the system (like they would in RAID5,6,50,60 for any writes or degraded array reads).

RAID0 Array:

RAID is Redundant Array etc.; so RAID 0 is kinda an oxymoron, in that it is the nonRedundant-Redundant Array of Indep. Disks; It is the only RAID level that doesn't give higher data availability (just speed and space increased).

non-OS Array: For non-OS Arrays (apps, data, database) we can have many more reads, than writes (so not calculating parity unless array is degraded / missing HD). So, changes things a bit. Also, some databases specifically are more setup to read from a stripe across 0,5,6,10,50,60 would be reading stripe across (not RAID1 or 01). If feeding a data base, that has it's own software caching on, it is best to turn off the hardware RAID caching thru the RAID controller.

co$t of Array: RAID 5 can be cheapest redundant array to deploy space wise (RAID0 cheaper, but not redundant).

maintenance of Array: backup, verified. Updates Reset to pristine redundancy/mirror/parity monthly to comb out any bad blocks before having a HD fall offline. In single fault tolerant array scheme, would want rest of array pristine; when a HD fails/falls offline. A RAID controller set to narrower tolerances, could be a better controller, demanding more; but seem to have more failed or just HD's fallen offline..

amount of HD's in stripe across of array: Reading across a stripe will get faster with wider stripe(more hd's) slowing down in RAID5 about HD8 (9th drive), as the overhead form the parity calculation becomes so enormous; assume this paradigm is hit earlier in double parity RAID6 types. The more HD's in array stripe, the greater chance 1 will fall offline and have to be rebuilt back into the sequence of the array. BUT Also: the more HD's in array, the greater chance that when a hd fails and there is a bad block of data, that it will not be on the failed HD, and thus PUNCTURE single fault tolerant array.

Double Redundancy in array: RAID6 can be more in vogue on HUGE array, or more critical; but not speed is double fault tolerant. When in rebuild, take more chance of double fault on fault tolerant array (increasing risk/exposure) in a larger array rebuild. Larger multi level arrays(10,50,60etc.), can have even more fault tolerance to face running day to day risks, as well as rebuild faults.

maintenance X amount of HD's in array: More HD's give greater chance of 1 falling offline X larger chance of puncturing array = more risk/care in handling of larger array.

user200179

Posted 2012-10-29T11:51:14.047

Reputation: 11

0

You should go for RAID1 or RAID5. The choice depends on your budget on one hand, and on the other the space you need for your data. :

  • With raid1 you can achieve a great fault tolerance, but the space available will be the same as if you have only one disk. Raid5 has a lower fault tolerance (only one drive), but you have a better space efficiency, so your available space will be a function of the number of drives (and it gets better than raid1 if you have more than 3 disks).
  • Performance-wise, raid1 is slightly better in reading, whereas raid5 is greatly better in write.
  • You can build a raid1 with 2 disks minimum, whereas you'll need at least 3 for raid5 (but you'll need smaller disk to get the same space as in raid1).

And as speakr said,

A RAID is never replacing a backup!

m4573r

Posted 2012-10-29T11:51:14.047

Reputation: 5 051