I'm developing an application for my company that will require a lot of compute capacity (running some very big mathematical calculations), and looking for some form of server setup to do this. For various reasons, we want to run this on-site in our office rather than hosting it externally.
It's been a while since I last had to set up my own servers so I thought I would tap into the collective wisdom of serverfault!
My broad requirements are:
- Budget $30-50k, with an aim to get as much compute capacity as possible for that budget
- 64-bit servers suitable to run Ubuntu Linux + Java
- Some relatively standalone rack that can be installed in secure office space
- Fast/low latency network connections between the servers, but don't really care about connectivity to the outside world
- Storage capacity shared between the servers - they don't necessarily need their own storage providing they can be booted from a common image
- Downtime can be tolerated (since the calculations are run in batch mode)
- The software itself is fault-tolerant, so there is no need for extra resiliency in the server setup (cheap replaceable commodity parts will be fine in general)
Given these requirements what kind of setup would you recommend and why?