I'm trying to figure out what to do for a small business that has been plagued by ridiculous hardware problems. Right now, this business runs on five or six desktop machines; no server infrastructure is in place. On top of that, and I'm not embellishing this, they have seen four hardware failures this year to date, and it's got them bordering on madness.
I've already discussed with them the notion of putting a Small Business Server in place (they're a microsoft shop), and they're receptive to the idea. I also plan on getting my feet wet with System Center Essentials to keep on eye on things. The focus then becomes ensuring that this server remains available.
Also, I've just read through this other high availability thread. Much like the guy in that thread, I'm very new to IT, coming from a programming background instead.
Some ideas come to mind:
- Simple raid-5 with hot-swap edit: and hot-spare
- Get two cheaper server machines, configure to run one virtualized server with hot-migration (I've done some reading but sadly I can't tell if SBS Standard and SCE will support this)
- Failover clustering? I got this term from the other thread but haven't been exposed to it in the past.
Is there a best practice when it comes to this? The business owner is willing to dig into his pockets a little for this because he's becoming terrified of downtime, but I've got no experience with these to lead me in one direction over the other.
I'd appreciate your wisdom!
edit: To provide some addtional detail on the problems they've experienced, it's been a weird mix of inexplicable failures.
- switch on chassis fails to power on the system: motherboard had onboard switch, which provided a stop-gap solution, however switching out the case didn't fix the problem. Later, switching out the motherboard didn't fix the problem either.
- Two identical machines have both suffered drive failures in their raid-1 arrays, and both machines were assembled no more than 5 months ago.
- Boot failure issues: one system in raid-1 fails to boot at all. Unfortunately I didn't write down the original error message, but in my notes I have that "Failed to save startup options" in Windows Repair & Recovery led me to this thread which supported my suspicions that it was a hardware-related issue.
edit: Also, the machines are running in a collection of home offices, so residential-grade electrical is at play. I guess this may be more of a contributing factor than I'd given it credit for. However, the machines are all run on desks (literally desktops!) and not on the floor; I don't believe dustiness is involved.