I administered a somewhat similar server cluster (two-member cluster) a little while back, and my basic approach was ~CPU bottleneck... max out processor speed and core count
. It was also on high-end servers with a lot of RAM, high capacity SSDs (for quickly accessing the images that were temporarily stored on it while processing occurred) and multiple 1Gbit NICs, again to maximize performance, and therefore throughput of processed images.
Worked really well, and with a high work load for what was a business-critical function, the MBAs and other suits were happy to invest an extra ~$25,000 in hardware to make things run smooth. Once I spoon-fed the benefits of my recommendation to them, at least. And yes, I think you're over-thinking this, at least a bit. Start by "just" throwing powerful hardware at the problem. If that doesn't deliver the performance required, then you need to worry about optimizing code and examining process flow for bottlenecks.
Before I get into my basic process to create and "sell" my recommendation, let me just point out an SA adage:
It's better to over-engineer than under-engineer.
- (in this particular instance, It's better err on the side of specing out "too much" of a server than the side of "too little.")
I'd say that your situation sounds pretty similar to mineand would advise a similar approach. And I guess none of the below is strictly "Systems Administration," so you can probably stop reading here, but it's all very closely related, and if you can't sell your recommendations effectively to the business units, systems administration and architecture is hell... if you can, you get to call yourself a systems engineer/architect and demand more money for essentially the same job, because you have the ability to put your ideas in simple numbers for the "slow children." (Executives and MBAs.)
My Process:
- Do basic benchmarks and use simple projections to get a basic ballpark for what performance will be needed.
- Creates an estimate, and helps me select what hardware (in a general sense) to get, and I'll come up with 3 specs.
- A minimum I think will work (which is really the minimum +~15% to give me a buffer/safety margin if my numbers are a bit off)
- A reasonable maximum/"best" possible configuration
- A medium value, which is something in between the two.
- Based on the business priorities (Saving money? Maximum output/performance on this system? Some compromise?), I'll make one my recommendation, and the other two included options that should work, if my recommendation isn't accepted for whatever reason.
- I'll break out the cost of each solution, and do a rudimentary, quantitative cost/benefit analysis where ever possible. In this case from my past, it was a revenue-generating business process, so it was a simple matter of ~
we made $[x] from this last year, on [y] images processed, for a profit of $[z]/image processed, or $[i]/day.
I then converted the cost of the hardware into similar units - $[x] total CapEx, or $[z]/image processed or $[i]/day
. (And when a different future "projection" exists, do the same for the "projected " future numbers.)
- And once you've done the costs, it's on to computing or breaking out the benefits. Using those dollar figures and cost comparisons to inform your decision (and advocate for your ultimate recommendation) is really helpful, and in my experience, usually justifies spending extra on potentially under-utilized hardware.
- in the case I'm alluding to, the "extra"
$25,000
my recommendation cost over the initial suggestion of ~buy a single mid-range server and slap our image processing stuff on that
broke down to about $70 a day
, and <10 cents an image processed
... and we made a lot more than 10 cents
on every image we processed.
- And to make doubly sure the suits took my over-engineered, redundant cluster recommendation (that would be much easier for me to support and administer) I broke out the costs of not having enough computing resources on the server - potential lost revenue from a lack of capacity, hourly costs of doing it "manually" on workstations (
several hundred dollars an hour
), and added the qualitative benefits (supportability
, ease of management
, etc.) and costs (reputation damage
, angry clients
, etc.) as the cherry on top. And then pointed out, once again, that my recommended solution only costs $70 a day
.
Anyway, that's a great deal longer and less technical than I was anticipating... but I hope you find it helpful.