0

Let's say it's something like Stack Exchange that I'll be running (it isn't, but it's close enough to have similar resource usage), and I'll have approximately 20000 users from August to January (with many more after that time).

The network this will be hosted on is pretty decent (25/25 Mbps) and the infrastructure is good (there hasn't been a power outage in years).

Does $5000 seem like a reasonable amount to expect to drop on the hardware, or should I look at a larger or smaller amount?

Also, how should I treat the tradeoff between raw specs and server-grade hardware (given a limited budget)?

For hard drive space, I'm planning around 10 to 15 TB in software RAID 10.
~$1300

As far as power, I'm thinking two PSUs (beyond wattage, is there anything specific I should look for?) and a $100 or $200 UPS for good measure.
~$300?

What I'm less sure about is the CPU/RAM/mobo I should be looking at to scale like this.

One possible approach would be to arbitrarily pick something like two 12-core Opterons and 96 GB RAM then decide that sounds good,
or I could allocate a certain amount of money toward this then just max out the specs within that bound,
but I'd like to be reasonably scientific in my approach.

So, what kind of CPU/RAM/mobo setup would you recommend and why?

Also, clustering. If not for the initial system, is clustering something I should look into later (e.g. when I need to upgrade / scale up the system), or are there better ways of accomplishing that. Clustering is something I really have no experience doing, but the server OS is going to be Ubuntu 10.04 if that helps in tailoring any advice.

Ryan Lester
  • 267
  • 1
  • 9
  • WHy raid 10 and not 5? – Oskar Kjellin May 04 '11 at 20:00
  • @Oskar, RAID 10 is faster for almost all IO loads. – Chris S May 04 '11 at 20:06
  • 2
    You need a power factor of over 9000, I would say. – Holocryptic May 04 '11 at 20:07
  • Mostly for redundancy and performance reasons (RAID 5 has relatively slow write performance; software RAID 5 would also take up more CPU than RAID 10, and from my understanding really well-performing RAID 5 can generally only be accomplished with more expensive hardware implementations). RAID 10 just seems like a faster, more redundant, more reliable, and overall much better option for very little extra cost (given drive prices these days). – Ryan Lester May 04 '11 at 20:08
  • See also: http://serverfault.com/questions/263694/why-is-hosted-storage-so-expensive – Chris S May 04 '11 at 20:09
  • @Chris S Does not seem to be true. Have you read http://en.wikipedia.org/wiki/RAID#RAID_10_versus_RAID_5_in_Relational_Databases? – Oskar Kjellin May 05 '11 at 07:56
  • @Oskar, that part of the article is riddled with wrong information, (I'll make a point to edit when I have time). The first point about parity being a background task is almost exclusively wrong, DBMS's use synchronous writes and the SAN can not return the IO request until it is safely cached or written to disk (on sustained writes, the cache will will quickly and you're disk bound again). Interleaving parity doesn't cause it to be slowed; in cheap cards the parity takes time to calculate, on expensive ones you still have the RAID5 write-hole. In any case, the article is misleading at best. – Chris S May 05 '11 at 12:25
  • @Chris Okay, Seems like you know this quite well. I withdraw my objections – Oskar Kjellin May 05 '11 at 12:36

1 Answers1

2

You might be best off renting capacity from the cloud (Amazon EC2 or similar) until you have a better idea of your future scaling requirements.

jlew
  • 141
  • 5
  • I'm considering that as well, but I don't trust Amazon after all the recent down time, plus moving off their infrastructure to my own will be a bit of a pain (tons of data). – Ryan Lester May 04 '11 at 20:02
  • @buu700 - you shouldn't trust *any* hosting company. Any downtime or data loss people experienced with that EBS issue was 100% their own fault for poor and/or lazy engineering decisions. – EEAA May 04 '11 at 20:12
  • @buu700, note that Netflix online VOD is hosted on EC2 and experienced few problems during the Amazon "outage" on account of excellent service design. Designing for high availability isn't solely the responsibility of the hardware and infrastructure. See the [Netflix blog](http://techblog.netflix.com/2010/12/5-lessons-weve-learned-using-aws.html) for more info. – Chris S May 04 '11 at 20:13