Sobrique explains how the manual intervention causes your proposed solution to be sup-optimal, and ewwhite talks about probability of failure of various components. Both of those IMO make very good points and should be strongly considered.
There is however one issue that nobody seems to have commented on at all so far, which surprises me a little. You propose to:
make [the current hot spare host] a cold spare, take the hard drives and put them in the primary host and change the RAID from 1 to 1+1.
This doesn't protect you against anything the OS does on disk.
It only really protects you against disk failure, which by moving from mirrors (RAID 1) to mirrors of mirrors (RAID 1+1) you greatly reduce the impact of to begin with. You could get the same result by increasing the number of disks in each mirror set (go from 2-disk RAID 1 to 4-disk RAID 1, for example), along with quite likely improving read performance during ordinary operations.
Well then, let's look at some ways this could fail.
- Let's say you are installing system updates, and something causes the process to fail half-way; maybe there's a power and UPS failure, or maybe you have a freak accident and hit a crippling kernel bug (Linux is pretty reliable these days, but there's still the risk).
- Maybe an update introduces a problem that you didn't catch during testing (you do test system updates, right?) requiring a failover to the secondary system while you fix the primary
- Maybe a bug in the file system code causes spurious, invalid writes to disk.
- Maybe a fat-fingered (or even malicious) administrator does
rm -rf ../*
or rm -rf /*
instead of rm -rf ./*
.
- Maybe a bug in your own software causes it to massively corrupt the database contents.
- Maybe a virus manages to sneak in.
Maybe, maybe, maybe... (and I'm sure there are plenty more ways your proposed approach could fail.) However, in the end this boils down to your "the two sets are always in sync" "advantage". Sometimes you don't want them to be perfectly in sync.
Depending on what exactly has happened, that's when you want either a hot or cold standby ready to be switched on and over to, or proper backups. Either way, RAID mirrors of mirrors (or RAID mirrors) don't help you if the failure mode involves much of anything aside from hardware storage device failure (disk crash). Something like ZFS' raidzN can likely do a little better in some regards but not at all better in others.
To me, this would make your proposed approach a no-go from the beginning if the intent is any sort of disaster failover.