Backround: We are in need of a HA server in a small office environment and are looking at DRBD to provide it. We only have about 100GB that needs to be on the HA server and server load will be extremely low. The data will probably increase about 10%-25% per year if we archive older office data, and 50%-75% each year if we don't.
Point is we use a mix of consumer grade and used enterprise grade hardware which WILL be a problem if we don't preemptively plan for it; and pre-built quality servers DO fail, so redundant servers seems like the way to go.
The Plan: We are thinking it would be good to find (2) of the best bang-for-our-buck used servers and synchronize them. We simply need SATA/SAS capable servers and space for as many drives as can be had for the price. These servers seem like they can be had for $100-$200 (+some parts and additional drives) if you catch a deal.
This would theoretically mean a server could fail and if we took days to get to it, as long as we didn't have another coincidental failure, things would still hum along until our IT department (me) could get to it. We would use Debian as an OS.
Some Questions
(A) How does DRBD handle drive or controller failure? That is This shows DRBD before the storage driver, so what happens when the controller fails and writes dirty data or the drive fails but doesn't crash immediately? Is the data mirrored to the other server or not and is there risk of data corruption across servers in cases like these?
(B) What are the fail points for DRBD; that is theoretically as long as one server is up and running there are no issues EVER. But we know that there are issues so what are the fail modes using DRBD since most of them should theoretically be software?
If we are going to have two servers for this, would it be reasonable to run VM's on each with MYSQL and Apache for database and web server replication? (I am assuming so)
Is DRBD reliable enough? If not, is the unreliability isolated to certain tasks, or is it more random. Searching turned up people with various issue but this IS the internet with seemingly more bad info than good.
If data is being synchronized over LAN, does DRBD use double the bandwidth? That is, should we double up on NICS and do some link aggregation and trunking? Then maybe put them on separate routers on separate circuits and UPS's in separate rooms and now you really have some redundancy!
Is this too crazy for an office in terms of server management? Is there a simpler REALTIME alternative (granted DRBD seems simple in theory).
We already have a server. So it seems to me a second USED server with a dedicated drive for DRBD could easily be had for around $150-$250 with some smart shopping. Add a second router, more drives, more NIC's (Used), and (2) UPS's and were talking $1,000 +/-. That is relatively cheap! And I am hoping this would mainly buy us time during a server fault. Drive failures seem like the easier thing to handle with RAID these days. It's other hardware failures like controllers, memory, or power supplies that might require downtime to diagnose and fix that are the concern.
Redundant servers for us means used hardware becomes more viable with more up time and more flexibility for me to fix things when my schedule allows vs having to stop everything to repair the server.
Hopefully I didn't miss that these questions have easy searchable answers. I did a quick search and didn't find what I was looking for.