2

I'm running a workstation with dual xeon 5690's (12 physical/24 logical cores), 192 gb of ram (ie, maxed-out), Windows 7 64bit, 5 slots for adapter cards, and 1 tb of internal storage, with 5 more internal bays available.

I have an app that creates data files totaling about 88 tbs. These are written once every 14 months, and the rest of the time the app only needs to read them; and > 95% of the reads are sequential reads of huge chunks of data. I have some control over how big the individual files are, but ideally they would be between 5 and 8 tbs.

The app will be reading from only one drive at a time, and the nature of the data is such that if (when) a drive dies I can restore the data to a new disk from tape.

While it would be nice to be able to use the fastest drive/controllers available, at this point size matters more than speed.

After doing lots of reading, I am leaning toward buying a bunch of cheap 2tb drives and putting them into a bunch of cheap enclosures. All this stuff is going into my home office, so I need to avoid the raised floor/refrigerated approach.

My questions:

  1. Is the cheap drive/enclosure solution the best one for this situation?

  2. Given the nature of the app and the way the data is used, does RAID make sense? If so, which one?

  3. For huge sequential reads, would Usb 3.0 and eSata be a wash performance-wise?

  4. For each slot available on the workstation, can I hook up an enclosure that can hold multiple drives? Or is it one controller per drive?

  5. If I can have multiple drives on one controller, am I essentially splitting the bandwidth (throughput)? For example, if I have a 12 bay enclosure, is the throughput of the controller reduced by a factor of 12?

  6. Are there any Windows 7 volume/drive/capacity limits I should be aware of?

Bart De Vos
  • 17,761
  • 6
  • 62
  • 81
PaeneInsula
  • 197
  • 6

3 Answers3

2

I'll try to answer as best I can, but some of the things you ask are more about personal preference and the features of your software than hard technical facts.

  1. Cheap drives are... um... cheap. There are only a few situations where cheap drives are "better" than enterprise class drives. With cheap/consumer drives you won't get the performance, reliability or support you would get with an enterprise drive, then again if your application can stand a couple of drives failing a year, and you have a few spare in a cupboard you can just stick in as and when they fail, cheap drives might be best for this situation. Most consumer drives aren't built to be run 24/7, though most can quite happily do this. If performance is a factor, you might want to reach a compromise between performance, reliability and cost, have a look into some lower end server drives.

  2. This is a tough one, since you say you don't need reliability, something like RAID5/6/10 might not make sense. If you want to address the entire array as a single disk within Windows you'll want RAID0 (Striped), however this comes with it's own issues, primarily that if a single drive in your array fails the entire array will be useless*. Given your requirements, I might suggest RAID5. This will reduce write performance which might be an issue with such large data-files, however read performance will be reasonably good. Also, depending on your application, it might be possible to mount each drive individually and independently (so they would show as separate in Windows explorer), your application would need to be smart enough to write the right data to the right drive but if one drive failed you would only loose that one drives data. This would also mean you would only have to have roof(total space needed/drive capacity) disks (Providing you decided not to have spare disks for redundancy)

  3. This is more down to maximum protocol speeds, can't remember them off the top of my head but they should be easy enough to find. However you're more likely to be limited by the disk speeds than by the cable speed.

  4. These are very similar questions but it might be worth investigating the Backblaze Storage Pod (And it's related issues), however their setup of chaining SATA cards might be what you're looking for, but I don't know how the internals of Windows would respond to this setup

  5. See above

  6. As far as I'm aware, the limit for Windows 7 is something like 250TB so you shouldn't run into any issues, but you should double check this.

As a related question, why are you using Windows 7? This sort of workload would be much better suited to Windows Server

*It's possible to restore the correct data to the one disk if you have good enough backups, but not 'pop a new disk in and RAID will fix it'

Smudge
  • 24,039
  • 15
  • 57
  • 76
  • I'm not familiar with Windows Server at all. I will look into it. Also, what do you mean by enterprise drives? Are they made by the same companies that make the cheap ones (WD, Seagate, etc)? – PaeneInsula Dec 01 '11 at 09:29
  • You want to get 3.5 inch 7.2k RPM hard drives. They are fairly reliable, and since you're doing sequential reads instead of random reads, they'll perform almost as well as 15k RPM drives. Desktop quality drives have 5.4k RPM spindle speeds, and are typically more likely to fail. – Basil Dec 01 '11 at 14:53
  • 1
    Even cheap drives aren't cheap anymore.. Damn flooding. – Tom O'Connor Dec 01 '11 at 17:42
  • @user994179: Seagate Constellation ES.2, Western Digital RE (Raid Edition) 4, and Hitachi Ultrastar A7K2000 or 7K3000 are all considered "Enterprise SATA" drives. They offer higher MTBF and are rated for 24/7 operation, as opposed to desktop sata drives which have an MTBF that assumes (typically) 8x5 operation or similar. – Daniel Lawson Dec 01 '11 at 23:18
  • @user994179 FYI, the BackBlaze pods are on sale [here](http://www.openstoragesystems.co.uk/products/backblaze-storage-pod) excluding drives, pulling your storage external to your system would probably have some positive benafits **however** be aware the storage pods are effectively designed to fail, so to speak, so don't expect reliability – Smudge Dec 02 '11 at 09:14
0

-1) Is the cheap drive/enclosure solution the best one for this situation?

If it's all you can afford, then its all you've got to go with, personally no, i wouldn't trust it as they can and do fail more often than you'd want.

0) Given the nature of the app and the way the data is used, does RAID make sense? If so, which one?

Yes. I think you're realistically looking at raid 5+1 or raid 6. Given the rates at which 2TB drives fail you'll want more than raid 5, anyhow. Bear in mind rebuild times will be measured in days when a drive goes, and performance will be horrific during that time.

1) For huge sequential reads, would Usb 3.0 and eSata be a wash performance-wise?

Perhaps, it's hard to say. USB 3.0 is pretty fast, but 've not tried it in a raid setup. USB 2.0 with 7 USB enclosures was pretty poor though, but i don't know if it was the nature of the usb 2.0 or the nature of hanging it all off usb.

3) If I can have multiple drives on one controller, am I essentially splitting the bandwidth (throughput)? For example, if I have a 12 bay enclosure, is the throughput of the controller reduced by a factor of 12?

If i understand you right (and you're still referring to usb) then yes, a usb hub splits its bandwidth across each port, and if you have multiple drives off each port you'll get less speed per drive, although only if usb3.0/number of drives is less than the max speed of one drive.

Sirex
  • 5,447
  • 2
  • 32
  • 54
0

Is the cheap drive/enclosure solution the best one for this situation?

Define cheap. If you mean sub-$300 per box with 5+ bays, then yes, that would work fine.

Given the nature of the app and the way the data is used, does RAID make sense? If so, which one?

(A) Yes. (B) That depends. Specific RAID setups will be faster for reads, but suck horribly at writes; each one has distinctive tradeoffs.

For huge sequential reads, would Usb 3.0 and eSata be a wash performance-wise?

Don't bother with USB 3, stick with eSATA. USB looks cheap, sounds cheap, and when you get it bought and put together at this scale, will perform cheaply. USB contention will kill any gains you made, especially if you choose to RAID-stripe your data, because there is effectively 1 bus and N spindles to shuttle down that single bus. eSATA will at least restrict how many spindles are shuttled over a single interconnect; SATA port expanders that give 4:1 are not uncommon, which means a whole lot less contention.

For each slot available on the workstation, can I hook up an enclosure that can hold multiple drives? Or is it one controller per drive?

(A) Yes. (B) No, you don't want that, but it is available.

Get a card that has 4+ eSATA connectors and supports SATA expanders on each port. Put a card into each slot, for a total of 4 ports per card * 5 slots * 4 expander ports per card port = 80 eSATA ports. That should get you going if you use 1.5Tb ( 80 * 1.5 = 120Tb) or 2Tb (80 * 2 = 160Tb) drives.

Of course, this arrangement will have 4 * 5 = 20 drive bays. If you can find a denser solution, it would be in your favor...

If I can have multiple drives on one controller, am I essentially splitting the bandwidth (throughput)? For example, if I have a 12 bay enclosure, is the throughput of the controller reduced by a factor of 12?

For eSATA, one ribbon cable = one bus, so it only counts if all of the drives go down that one cable at the same time. If only one is using it at a time, there isn't any contention, so there's nothing to worry about. In a RAID setup, with all of the drives flooding the bus, you're sunk. Using the multiple card/enclosure/expander route, you minimize your contention because each physical cable has at most 4 drives to contend with.

Are there any Windows 7 volume/drive/capacity limits I should be aware of?

Dunno, although it looks like someone already answered that. (Answer appears to be 250Tb).

Avery Payne
  • 14,326
  • 1
  • 48
  • 87