1

We currently have a number of servers in places where the power often drops out for various reasons. The servers all run ESX with a couple of linux guests. Putting in UPS's here isn't practical just yet so I'm looking for some ways which we can reduce our risk of corrupting the file systems.

I originally looked at RAID Controllers (not specifically for disk redundancy) but I've read that using features like caching can actually increase the risk of data loss. It also seems that controllers with cache batteries may help but I'm not entirely convinced.

Does anyone know if RAID controller cards do in fact provide this kind of protection, or is there anything else we can do generally to reduce our risk?

Chris Edgington
  • 225
  • 2
  • 3
  • 11
  • I would choose a UPS over a BBWC any day - http://symcbean.blogspot.co.uk/2014/03/warning-bbwc-may-be-bad-for-your-health.html – symcbean Mar 30 '17 at 12:07
  • I am aware that a UPS is the way to go, I was just looking for alternatives as in our particular situation, due to political reasons the installation of 200 or so UPS's is some way off. – Chris Edgington Mar 30 '17 at 12:22
  • 3
    More context, please!! Why wasn't a UPS or power-protection part of the deployment of 200 VMware servers? – ewwhite Mar 30 '17 at 13:58
  • https://www.tesla.com/en_GB/powerwall – tombull89 Mar 30 '17 at 14:00
  • I really wish I could answer that. Working on the server team we specified the environment to have UPS's at every site from the get go, however unfortunately someone decided this wasn't necessary (basically because they deemed that loss of service at a particular site during a power cut wasn't an issue). Unfortunately the system integrity was not considered and consequently we are in this situation. The good news is that the UPS is now back in scope! However the rollout to sites is going to take some time, hence why I was investigating the interim measures. – Chris Edgington Mar 31 '17 at 10:29

3 Answers3

5

Storage controllers can come with a Battery Backed Write Cache and/or a super cap(icitor) in the case of SSD's to protect cached writes during power outages.
There are indeed scenario's when those still won't protect your data integrity, although BBWC will provide better protection than using a write cache without a battery, completely disabling any and all write caches (at the cost of some performance) can be more reliable.

But the recommended tool is still an UPS, as even a small one will allow the systems to do a graceful shutdown when during a power outage the batteries are close to getting exhausted (as well as protect the hardware against repeated spikes in power).

HBruijn
  • 72,524
  • 21
  • 127
  • 192
1

RAID controllers with BBU can sometimes reduce the risk of filesystem errors when an ungraceful shutdown happened. Sometimes because you can't guarantee that the OS on your filesystems isn't killed while writing some block on your disk so the OS will become unusuable (not necessarily the filesystem, though). In the meantime I would invest in some (very small) UPSs, like this, so that the servers can do a graceful shutdown whenever a power outage occurs.

Edit: Well, @HBruijn beat me to it :)

Lenniey
  • 5,090
  • 2
  • 17
  • 28
1

Having a single host with a single RAID with write-back cache enabled, someday for sure it would be a data corruption issue.

Take a look at software-defined storage solution that provides =>2 failures to tolerate (2+ copies of your data stored). Having a cluster of nodes where each one has a connection to UPS and shared storage provided on top of RAID 10/6 probably would reduce the data corruption to none. Obviously, 2 independent power lines should be connected to UPS. Having this kind of setup you can enable/use cache for sure.

Joshua Turnwell
  • 530
  • 3
  • 12