8

I am in charge of a new website in a niche industry that stores lots of data (10+ TB per client, growing to 2 or 3 clients soon). We are considering ordering about $5000 worth of 3TB drives (10 in a RAID 6 configuration and 10 for backup), which will give us approximately 24 TB of production storage. The data will be written once and remain unmodified for the lifetime of the website, so we only need to do a backup one time.

I understand basic RAID theory, however I am not experienced with it. My question is, does this sound like a good configuration? What potential problems could this setup cause?

Also, what is the best way to do a one-time backup? Have two RAID 6 arrays, one for offsite backup and one for production? Or should I backup the RAID 6 production array to a JBOD?

EDIT: The data server is running Windows 2008 Server x64.

EDIT 2: To reduce rebuild time, what would you think about using two RAID 5's instead of one RAID 6?

Phil
  • 1,003
  • 2
  • 11
  • 16
  • Is the data going to be read extensively? – Bittrance Jun 06 '11 at 17:04
  • 8
    Holy crap the rebuild on that would take a long time. – MDMarra Jun 06 '11 at 17:11
  • 2
    i would not trust all of my eggs to be in two raid6 sata arrays. i would suggest tape as a better choice for longer term archive. – johnh Jun 06 '11 at 17:13
  • @Bittrance Yes, it will be read extensively – Phil Jun 06 '11 at 17:16
  • 1
    What are your [MTTR](http://en.wikipedia.org/wiki/Mean_time_to_recovery) requirements? 24TB of LTO4 tape would work out a lot less than $2500 if you can afford to recover over a day or two. –  Jun 06 '11 at 17:27
  • @MarkM How long is a long time? Days? – Phil Jun 06 '11 at 19:16
  • @Phil - that really depends on how much data is on there, what the load is on it during the rebuild, and also the controller, but it's not unreasonable to think that a 24TB R6 could rebuild for days if your app is read-heavy. – MDMarra Jun 06 '11 at 19:44
  • If the "backup" is just a copy of the data, might consider RAID60 than two separate RAID6 (though RAID60 doesn't protect against accidental deletion). As for rebuild time, it's dependent on the individual disk size and usage patterns, doesn't matter how you arrange the disks in the array (within certain limits). – Chris S Jun 06 '11 at 19:52
  • @Jack Douglas From what I read, tape backups can be pretty unreliable. Do you have experience that says otherwise? http://www.mainframezone.com/storage/backup-recovery-business-continuity/tape-a-collapsing-star – Phil Jun 06 '11 at 20:46
  • 1
    @Phil, anything you read about tape backups being unreliable is either completely (like a decade or so) out of date or is utter nonsense. This isn't the place to go into it but put simply, modern tape technology has error detection and recovery at a level never seen in hard drive. Then you also have to consider that hard drives are by far the single most unreliable computer component ever created. Drives that have been used and then placed in storage doubly so. – John Gardeniers Jun 06 '11 at 21:33
  • @John Thanks, good to know. I'll be looking into that as well. – Phil Jun 06 '11 at 21:44

8 Answers8

16

I currently support 220 servers up to 96 TB (totalling 2 PB or so), some in clusters of up to 240 TB, that my team built. Here are my advices :

  • use a good, reliable hardware RAID controller : possible choices are 3Ware 96xx or 97xx, LSI 92xx, Areca 16xx, Adaptec 5xx5... Of course, with a Battery Backup Unit because power failures occur sometimes.
  • use only professional grade drives,coming with 24/24 and 7/7 operation support; don't use cheap desktop drives. You don't want to lose 100,000$ worth of data because you chose to save 20 bucks per drive.
  • The biggest the drives, the longer the rebuild. 3 TB will need at least 12 hours in the best case. Use RAID-6 for reliable protection.
  • drives do fail. Up to 5% per year; don't even dream of using JBOD, even for backup. This is plain bad advice. Use RAID-6.
  • RAID-5 is obsolete, we simply don't use it anymore with drives bigger than 300GB. See this expert post for instance. Did I mentioned you should use RAID-6?
  • For only 24 TB, I'd stick to 2 TB drives; there is a 10-15% premium on 3 TB; more spindles will provide better performance, shorter rebuild, and better safety because the drives have been available for quite a long time and are really very reliable.
  • You could buy an excellent 3U Supermicro, AIC or equivalent chassis with 16 drives slots, filled with 2TB drives (RAID-6 + hot spare) that would provide exactly 24 TiB of available space and redundant power supplies.
wazoox
  • 6,782
  • 4
  • 30
  • 62
  • 1
    Extremely useful answer. Where do I find these professional grade drives? Everything I see on the web refers to just plain hard drives. Are you just suggesting getting drives with a high non-recoverable error rate? – Phil Jun 06 '11 at 21:47
  • +1 for build your own box – Javier Jun 06 '11 at 21:57
  • nonsense at a 5% failure rate a year - JBOD is jsut fine for backups for static data. It seems silly to spend as much on data as you would on backup. drives do fail but not nearly as often as one might get the impression from this lot. see http://www.storagereview.com/guide/specMTBF.html for a discussion of MBTF and hardrives. You would have drives piled high with such failure rates. Most importantly you shoud not be buying your own drives. By a san - you simply cannot get the drives san manufacturers use (or rather you cannot do the testing on drives like they do) – Jim B Jun 06 '11 at 22:55
  • @wazoox - why did you claim a failure rate of 5% then say "and better safety because the drives have been available for quite a long time and are really very reliable." – Jim B Jun 06 '11 at 22:56
  • @JimB: 5% failure rate is about what I currently get from a "sample" of several thousand drives in production. Newer drives ("advanced format", 4KB blocks) apparently fare much better but we can't be sure yet; so better safe than sorry. – wazoox Jun 07 '11 at 18:49
  • JBOD : given the current bit error rate, at 24 TB you're guaranteed to have about 2 errors, that is, corruption. Is it a chance you'll take? I wouldn't. This isn't a question of MTBF, but a question of irrecoverable bit error rate. Learn the difference. – wazoox Jun 07 '11 at 18:50
  • 1
    @phil: professional drives are labelled as such: Hitachi calls them Ultrastar (vs Deskstar), Seagate calls them Constellation ES, etc. – wazoox Jun 07 '11 at 18:52
5

Honestly, I think $5k for the drives is a bit steep... but that's a whole other subject. The setup sounds sound-enough, but in the event of a drive-failure... having a single-volume that is 24tb will take FOREVER to rebuild. (ever tried to read 3tb of data split across 9 other disks?) It would be better to have smaller raid-sets and join them together to form a bigger volume. If a drive fails, it doesn't kill the performance of the entire volume while the whole thing rebuilds... but rather only the performance of the one raid-set.

Also, depending on what your website is run on... (Linux/Windows/OSX/Solaris/???) can also dictate what tools you use and the configuration you use.

What do you mean by a "one-time backup?" If you meant a "one-way archive"... (i.e. new files are written to the backup-server.. but nothing is ever read from it), I highly recommend using rsync in *nix flavored environments (linux/unix/etc...) or if it's IIS (windows) based use something like synctoy or xxcopy. If you need a LIVE copy (0 delay between when a file is written to when it appears on other server) you'll need to provide more information about your environment. Linux & Windows work completely different, and the tools are 100% different. For stuff like that, you'll probably want to look into clustered-file-systems and probably should look more towards a SAN rather than host-based storage.

TheCompWiz
  • 7,349
  • 16
  • 23
  • By one-time backup, I mean once data is written to the array, it will not be changed. The write should only happen once - during the initial copy from the client's disks to our disks. – Phil Jun 06 '11 at 17:20
  • Linux (rsync) or Windows (xxcopy/synctoy)? – TheCompWiz Jun 06 '11 at 17:22
4

We generally use RAID5 or 6 for backup disks as it gives the best bang-for-buck once you ignore RAID 0 :-) so I'd go for that rather than JBODs

One thing you might consider is buying your disks in separate batches rather than all 20 at once as if there is a manufacturing defect in a batch, they may fail at similar times.

You also may wish to consider using mirroring rather than conventional backups if the data is only being written once - there are quite a few software and hardware storage systems that allow that to be set up and you may also get the benefit of failover in the event of your primary storage failing.

Phil
  • 3,138
  • 1
  • 21
  • 27
2

One option that would fit well with your use-case, especially if your requirements keep growing, is an HSM (Hierarchical Storage Manager). I've installed several HSMs ranging up to 150TB of disk and 4PB of tape.

The idea is that an HSM manages the lifecycle of data to reduce the overall cost of storage. Data is initially stored on disk but almost immediately archived to tape (which is much cheaper per byte). Archive policies can be configured to store multiple copies on tape for extra safety, and most people take a second copy offsite. The migration to and from tape is transparent to the end user - the files still appear in the filesystem.

When the end user requests the file in future, the data is automatically staged back from tape and served to the user. With a tape library, the staging process only adds about a minute to the retrieval time.

One huge benefit of an HSM is the recovery time if your disks fail or if you have filesystem corruption. If you ever have a catastrophic disk or filesystem failure, you can just find some more disk and restore a recent backup of the filesystem metadata (a tiny fraction of the total data volume). At that point, all of the data is available on-demand as per usual.

Tom Shaw
  • 3,702
  • 15
  • 23
1

when determing the raid configuration for a san you have to worry about performance and the amount of reliability, and recovery time you require. Because you double the number of parity writes (depending your particualr flavor of raid six) it's usually best in a san with custom ASICs to do the calculations. Since your data is static your real concern is how long you can afford to be in a degraded state should 1 drive fail. Also of note is that drives tend to fail multiples so it's best to install drives with some time between sets.

As far as backups go, I see no need for redundancy in the backup set so JBOD is fine

Jim B
  • 23,938
  • 4
  • 35
  • 58
  • 2
    JBOD would be a very bad idea. Given general drives bit error rate, that's a recipe for silent corruption. Spend a little more and buy the 2 additional parity drives for RAID-6 and excellent protection. – wazoox Jun 06 '11 at 21:05
  • parity for what? hopefully the backup solution already has error checking and recovery built in. If not uses rar format and add a recovery record. – Jim B Jun 06 '11 at 21:24
  • So for backups you'd spend just as much money as you would on production? That's fairly ludicrous – Jim B Jun 06 '11 at 22:45
  • To connect 16 or 24 drives you'll need a RAID controller anyway. Not using parity will save you 200$; using parity will buy you reliability, data integrity, and no hassle in case of disk failure; and you could even use your backup system for failover if necessary. – wazoox Jun 07 '11 at 18:58
0

I currently have filesystems on that scale range, currently totaling 58TB onsite, plus a separate copy offsite.

I've had a few drive failures and yes, the bigger the drives, the longer the rebuild. To alleviate it somewhat, I split the storage in several RAIDs, each one 5-7 drives. It's currently RAID5, but when I get 3TB drives I plan to start using RAID6.

It's all joined and resplit with LVM, so i don't have to think about what goes where, simply add extra boxes when needed and remove old drives when they're too small to justify the slots they occupy.

The Hardware is mostly Coraid AoE boxes (but some iSCSI targets will join soon), managed with LVM, the filesystems are Ext3/4 if under 4-6 TB, or XFS if over that (up to 34TB, currently). All backup is handled with rsync and DVD for offline archive.

Besides some monitoring software (mostly Zabbix), it's a nearly maintenance-free setup.

Javier
  • 9,078
  • 2
  • 23
  • 24
0

Another point to add to what everyone is saying here. With Windows and huge file systems, if you do decide to break a filesystem up, but want to retain the same file structure as you would have had, look at mounting these drives to folder paths.

http://technet.microsoft.com/en-us/library/cc753321.aspx

Chris N
  • 687
  • 3
  • 8
0

I'm surprised nobody has suggested using MogileFS (github).

MogileFS will mirror data on different servers automatically and each disk is just a "JBOD" dumb disk. There are many production installations with many TBs (100+) of data.

For the server hardware there are many options for "lots of disks in an enclosure". For example a Backblaze Pod (a bit of do-it-yourself/unsupported, relatively) or a Super Micro server (we use Silicon Mechanics. I believe at wordpress.com they use regular 2U Dell servers with MD1000 enclosures for the disks.

Ask Bjørn Hansen
  • 490
  • 1
  • 3
  • 11