I've got 40 years in computing, but I've never had to build a server quite like this one, so this might be a n00b question.
I have a client that is going to offer ultra-high def music files for download. In this case that means FLAC-compressed 24/192Khz =~ 10GB/album. (No, I don't want to discus the desirability of the product, just the server configuration.) The catalog will be about 3,000 albums, with both ultra-high and low def versions (for their iPods, I guess), giving about 35-40TB or so of primary data.
Since this is a very specialized product, the market size is relative small (think: people who spend $20,000+ on their audio systems), which means most of the time the server is going to be 100% idle (or close to it). I have what looks like a good colocation offer from ColocationAmerica with a 1Gbps connection and bandwidth at about $20/TB, so now I just have to build a box to deliver the goods.
The data-access use case is write-once / read-many, so I'm thinking of just using software RAID 1 for pairs of drives. This would allow me (I think) to reconfigure spare drives for failed ones on-the-fly, thereby being able to start the rebuild of the second drive before some sysadmin notices the red light on the system (they do free swap out). It would be great if I could get most of the drives to sleep/spin-down if they aren't needed, which will be most of the time for most of the drives.
I don't need much in the way of compute power—this thing is just shoving fat-objects down the pipe—and so the CPU/motherboard can be pretty modest so long as it can support this number of drives.
I'm currently considering the following configuration:
Chasis: Supermicro CSE-847E26-RJBOD1
Drives: 30 4TB SAS drives (Seagate ST4000NM0023 ?)
MB: SUPERMICRO MBD-X10SAE-O w/ 8GB
CPU: Xeon E3-1220V3 3.1GHz LGA 1150 80W Quad-Core Server
So, am I going in the right direction, or is this a completely n00b / dinosaur way of approaching the problem?
Update to clarify a couple of points:
- I have no experience with ZFS, since the last Sun product I owned was back in the late 80's. I will do a little RTFMing to see if it feels right.
- I don't really need the filesystem to do anything spectacular since the file names are going to be simple UUIDs, and the objects are going to be balanced across the drives (sort of like a large caching system). So I really was thinking of these as 40 separate filesystems, and that made RAID 1 sound about right (but I admit ignorance here).
- Because our current expectations are that we will be unlikely to be downloading more than a couple dozen files at any one time, and in most cases there will be exactly one person downloading any given file, I don't know if we need tons of memory for buffers. Maybe 8GB is a bit light, but I don't think 128GB will do anything more than consume energy.
- There are 2 separate machines not mentioned here: their current web store, and an almost completely decoupled Download Master that handles all authentication, new product ingest management, policy enforcement (after all, this is the RIAA's playground), ephemeral URL creation (and possibly handing downloads off to more than one of these beasts if the traffic exceeds our expectations), usage tracking, and report generation. That means this machine could almost be built using gerbils on Quaaludes.
ZFS? Where's the benefit?
OK, I'm slogging my way through multiple ZFS guides, FAQs, etc. Forgive me for sounding stupid, but I'm really trying to understand the benefit of using ZFS over my antediluvian notion of N RAID1 pairs. On this Best Practices page (from 2006), they even suggest not doing a 48 device ZFS, but 24 2-device-mirrors--sounds kind of like what I was talking about doing. Other pages mention the number of devices that have to be accessed in order to deliver 1 (one) ZFS block. Also, please remember, at 10GB per object, and at 80% disk utilization, I'm storing a grand total of 320 files per 4TB drive. My rebuild time with N RAID 1s, for any given drive failure, is a 4TB write from one device to another. How does ZFS make this better?
I'll admit to being a dinosaur, but disk is cheap, RAID 1 I understand, my file management needs are trivial, and ZFS on Linux (my preferred OS) is still kind of young. Maybe I'm too conservative, but when I'm looking at a production system, that's how I roll.
I do thank all of you for your comments that made me think about this. I'm still not completely decided and I may have to come back and ask some more n00b questions.