3

I'm developing an application for my company that will require a lot of compute capacity (running some very big mathematical calculations), and looking for some form of server setup to do this. For various reasons, we want to run this on-site in our office rather than hosting it externally.

It's been a while since I last had to set up my own servers so I thought I would tap into the collective wisdom of serverfault!

My broad requirements are:

  • Budget $30-50k, with an aim to get as much compute capacity as possible for that budget
  • 64-bit servers suitable to run Ubuntu Linux + Java
  • Some relatively standalone rack that can be installed in secure office space
  • Fast/low latency network connections between the servers, but don't really care about connectivity to the outside world
  • Storage capacity shared between the servers - they don't necessarily need their own storage providing they can be booted from a common image
  • Downtime can be tolerated (since the calculations are run in batch mode)
  • The software itself is fault-tolerant, so there is no need for extra resiliency in the server setup (cheap replaceable commodity parts will be fine in general)

Given these requirements what kind of setup would you recommend and why?

mikera
  • 165
  • 1
  • 7
  • What kind of interconnects do you need? Ethernet, MPI, latency/bandwidth tolerances? How much storage, throughput, concurrency, IOps? Does the cluster require resiliency, or is the software failure tolerant? – Chris S Feb 26 '11 at 14:48
  • Good question! There is lots of tolerance, and the software is fault-tolerant itself (via job scheduling and completion checks) so no issue if any part goes down. And the computations are meant to run in batch mode so downtime isn't really an issue either. it's all about the compute capacity..... will update the question accordingly – mikera Feb 26 '11 at 14:58
  • If you need a consultant to come and help you set this up, my contact details are in my SF profile – Tom O'Connor Feb 26 '11 at 15:04

1 Answers1

3

Without knowing the answers to any of the above questions, and assuming the software can easily deal with hardware faults, I would consider something like:

  • A pair of HP DL 180 G6 loaded with the necessary storage. These would serve as mirrored storage providers and multimaster (or redundant) cluster masters. PXE-Boot/NFS servers for the nodes too.
  • Roughly 8 HP DL 160 G6 compute nodes (12GB, 2 6-Core X5650).
  • A pair of HP ProCurve E2510-24G Switches, unless you already have Switches

I figured the servers would cost roughly $5k each; of course the exact price depends on RAM/Disk/CPU/Etc configurations.

This doesn't include UPS/Power, Rack Enclosure, Backup, AC/Cooling, or other environmental considerations. I'm biased toward HP equipment because that's what we use; I have no other ties to HP.

One other note, when buying $50k+ worth of gear - no matter HP, Dell, IBM, etc - Get quotes from more than one of them, and make sure they know they're bidding against each others. The prices will drop ~10% right off (or they sometimes like to do in-kind deals, like adding $5k worth of extended warranties).

Chris S
  • 77,337
  • 11
  • 120
  • 212
  • 1
    I'm behind this 100%. Perhaps an HP Blade system might give slightly better price/density/size tradeoff. I'm more a fan of the DL3xx series servers, but that's down to preference i suppose. Cooling and Power are going to be a Bitch in an office environment. Noise too.. – Tom O'Connor Feb 26 '11 at 15:07
  • 1
    @Tom, The Blade servers do give better price per compute density; but the DL160 series is about the best performance per price (sans other considerations). The DL3xx servers are magnificent workhorses, but you pay for the reliability and versatility too (which it sounds like he doesn't need in this case). Good point about noise! – Chris S Feb 26 '11 at 15:11