26

I'm familiar with what a BBWC (Battery-backed write cache) is intended to do - and previously used them in my servers even with good UPS. There are obvously failures it does not provide protection for. I'm curious to understand whether it actually offers any real benefit in practice.

(NB I'm specifically looking for responses from people who have BBWC and had crashes/failures and whether the BBWC helped recovery or not)

Update

After the feedback here, I'm increasingly skeptical as whether a BBWC adds any value.

To have any confidence about data integrity, the filesystem MUST know when data has been committed to non-volatile storage (not necessarily the disk - a point I'll come back to). It's worth noting that a lot of disks lie about when data has been committed to the disk (http://brad.livejournal.com/2116715.html). While it seems reasonable to assume that disabling the on-disk cache might make the disks more honest, there's still no guarantee that this is the case either.

Due to the typcally large buffers in a BBWC, a barrier can require significantly more data to be commited to disk therefore causing delays on writes: the general advice is to disable barriers when using a non-volatile write back cache (and to disable on-disk caching). However this would appear to undermine the integrity of the write operation - just because more data is maintained in non-volatile storage does not mean that it will be more consistent. Indeed, arguably without demarcation between logical transactions there seems to be less opportunity to ensure consistency than otherwise.

If the BBWC were to acknowledge barriers at the point the data enters it's non-volatile storage (rather than being committed to disk) then it would appear to satisfy the data integrity requirement without a performance penalty - implying that barriers should still be enabled. However since these devices generally exhibit behaviour consistent with flushing the data to the physical device (significantly slower with barriers) and the widespread advice to disable barriers, they cannot therefore be behaving in this way. WHY NOT?

If the I/O in the OS is modelled as a series of streams then there is some scope to minimise the blocking effect of a write barrier when write caching is managed by the OS - since at this level only the logical transaction (a single stream) needs to be committed. On the other hand, a BBWC with no knowledge of which bits of data make up the transaction would have to commit its entire cache to disk. Whether the kernel/filesystems actually implement this in practice would require a lot more effort than I'm wiling to invest at the moment.

A combination of disks telling fibs about what has been committed and sudden loss of power undoubtedly leads to corruption - and with a Journalling or log structured filesystem which don't do a full fsck after an outage its unlikely that the corruption will be detected let alone an attempt made to repair it.

In terms of the modes of failure, in my experience most sudden power outages occur because of loss of mains power (easily mitigated with a UPS and managed shutdown). People pulling the wrong cable out of rack implies poor datacentre hygene (labelling and cable management). There are some types of sudden power loss event which are not prevented by a UPS - failure in the PSU or VRM a BBWC with barriers would provide data integrity in the event of a failure here, however how common are such events? Very rare judging by the lack of responses here.

Certainly moving the fault tolerance higher in the stack is significantly more expensive the a BBWC - however implementing a server as a cluster has lots of other benefits for performance and availability.

An alternative way to mitigate the impact of sudden power loss would be to implement a SAN - AoE makes this a practical proposition (I don't really see the point in iSCSI) but again there's a higher cost.

ewwhite
  • 194,921
  • 91
  • 434
  • 799
symcbean
  • 19,931
  • 1
  • 29
  • 49
  • 3
    NetApp file servers have for many years had NVRAM write-caches, and I've had a goodly number of those lose power and not trash their file systems. It's hard to prove that something saved one, because since one was saved, the disaster didn't happen; what evidence would you be looking for? – MadHatter Mar 12 '14 at 11:43
  • Arguably, you should also think about the failure modes of a battery-backed write cache verses a write cache without a battery. – Stefan Lasiewski Mar 12 '14 at 17:17
  • This is not a hostile comment at all, but I'm trying to understand... isn't this question a poll? – Mike Pennington Mar 12 '14 at 17:23
  • 1
    Not a poll - I've spent a lot of time investigating this - and can find lots of information about what the BBWC is supposed to do - but very little information about what benefits have been realised in practice. Note that the only response I've had below where someone says a BBWC has saved their data is where there was no managed shutdown in the event of a power failure. So far nothing has refuted my suspicion that: while a BBWC can save your data in some circumstances, these circumstances may be avoidable by other means. – symcbean Mar 13 '14 at 10:16
  • @MadHatter surely that's easy: the evidence looked for is a story of an array that didn't have battery/flash and suffered data loss on power failure. – RomanSt Mar 13 '14 at 14:21
  • 1
    No, that's evidence that **not having BBWC can lose your data**. Proving that - and I suspect most of the long-haul sysadmins on this system have stories where volatile data *was* lost in power outages; I most certainly do - wouldn't prove that **having BBWC can save your data**, which is what the OP asked for. – MadHatter Mar 13 '14 at 14:37
  • The point is most of the time when this protection comes into play you don't even notice. It seems like you've already decided the answer you want and just ignoring the evidence that doesn't suit your theory. Why did you bother asking? – JamesRyan Mar 22 '14 at 03:57
  • Because the available evidence implies that the memory on the controller is flushed during a barrier even though its non-valotile (and hence the flush just wastes time) and several people, including redhat recommend disabling barriers with a BBWC when potentially leading to much greater corruption. – symcbean Mar 22 '14 at 20:57
  • 1
    Some further analysis and modelling suggests that BBWC + no barriers can lead to undetected corruption with any IO scheduler other than NOOP (I could be wrong about thisbut have tried very hard to find evidenece to suggest otherwise). See also http://symcbean.blogspot.co.uk/2014/03/warning-bbwc-may-be-bad-for-your-health.html – symcbean Mar 24 '14 at 23:22

5 Answers5

34

Sure. I've had battery-backed cache (BBWC) and later flash-backed write cache (FBWC) protect in-flight data following crashes and sudden power loss.

On HP ProLiant servers, the typical message is:

POST Error: 1792-Drive Array Reports Valid Data Found in Array Accelerator

Which means, "Hey, there's data in the write cache that survived the reboot/power-loss!! I'm going to write that back to disk now!!"

An interesting case was my post-mortem of a system that lost power during a tornado, the array sequence was:

POST Error: 1793-Drive Array - Array Accelerator Battery Depleted - Data Loss
POST Error: 1779-Drive Array Controller Detects Replacement Drives
POST Error: 1792-Drive Array Reports Valid Data Found in Array Accelerator

The 1793 POST error is unique. - While the system was in use, power was interrupted while data was in the Array Accelerator memory. However, due to the fact that this was a tornado, power was not restored within four days, so the array batteries were depleted and data within was lost. The server had two RAID controllers. The other controller had an FBWC unit, which lasts far longer than a battery. That drive recovered properly. Some data corruption resulted on the array backed by the empty battery.


Despite plenty of battery runtime at the facility, four days without power and hazardous conditions made it impossible for anyone to shut the servers down safely. enter image description here

ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • 5
    Very informative, good job on keeping those outputs for however long. – deed02392 Mar 12 '14 at 12:42
  • Interesting! I wonder if HP plans to include in the Smart Arrays controllers the same battery-free cache that they put in the P2000 – Gabriel Talavera Mar 12 '14 at 12:46
  • 4
    @GabrielTalavera Yes, HP has been using flash-backed (capacitors) cache since 2010 or so. No more batteries. – ewwhite Mar 12 '14 at 12:48
  • Same here using Adaptec ;) No more worries and regular replacements. – TomTom Mar 12 '14 at 12:57
  • Thanks ewwhite - exactly the kind of thing I'm looking for. One question: what happenned to the UPS power? Does your UPS not bring down the system when low? – symcbean Mar 12 '14 at 13:06
  • @symcbean Tornado. See above. Not everyone has a networked UPS, or software agents for every platform or serial breakout cables to trigger shutdowns. Some people bank on having enough battery runtime or a generator to provide time to shut systems down. E.g. I don't implement UPS shutdown on my vSphere clusters. – ewwhite Mar 12 '14 at 13:21
10

Yes, had that case.

Server "without UPS" in a data center (with the data center having a UPS). PDU failure - system crashed hard. No data loss.

And that basically is it. The good thing about a BBWC is that it is in the machine. Have a UPS - believe me, sometimes someone does something stupid (like pulling the wrong cable). A UPS is external. Oh, THAT cable ;)

TomTom
  • 50,857
  • 7
  • 52
  • 134
  • Thanks TomTom. So it allows you to roll forward your data to the next barrier instead of rolling it back to the previous one (unless you don't use journalling or log structured filesystems). This is one of the key points I'm trying to assess here. It would seem to give marginally better retention for a fileserver role, but doesn't help with filesystem or OLTP DB integrity. – symcbean Mar 12 '14 at 14:25
  • Acutally it would - OLTP is structured to handle server power failures gracefully as long as the log writes are acutally written ;) And as log IO speed is limiting, "fake writes" (reported by he raid controller) give speed - but at the risk of data loss, unless you ahve a non-volatile cache. – TomTom Mar 12 '14 at 14:33
  • I note that RedHat are of the opinion you should disable barriers with BBWC - while that will improve performance, it provides no protection in the case of a sudden outage such as power loss - erk! https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/writebarrierconsider.html – symcbean Mar 12 '14 at 14:33
  • @symcbean You shouldn't have sudden power loss in your environment. That's one of the easiest situations to prevent. Why make your server run like *crap* 100% of the time for a relatively infrequent occurrence? – ewwhite Mar 12 '14 at 14:43
  • @ewwhite: isn't that what you said happenned to you? ;) I would always use a UPS with a critical server - but PSUs still fail, OS can fail, mobo/cpu can fail....it's about balancing the probabilities of events and thereby assessing the likelihood of realizing the benefits then assessing this against the costs. For now I'm still trying to collect the facts. – symcbean Mar 12 '14 at 14:50
  • 1
    Acutally the whole reason a BBWC exists is to mitigate the issue of a sudden power loss. Hence it is ok to have no barriers. – TomTom Mar 12 '14 at 14:50
4

I've had 2 cases where battery backed cache in HW RAID controllers failed completely (in 2 separate companies).

BBC relies on the unsurprising idea that battery works. The catch is that at some point battery in controller fails and what's devastating is that in many HW raid controllers it fails silently. We thought we had a cache protected against power loss but we did not.

On power loss the RAID array data loss was so extensive that all disk contents were rendered unrecoverable. Everything was lost. One of the cases involved a machine dedicated entirely for testing, but still.

After that I said "never again", switched to software-based disk mirroring (mdadm) in Linux + journal-based fs that has decent resilience against power loss (ext4) and never looked back. Granted, I've used it on servers that did not have extremely high IO usage.

LetMeSOThat4U
  • 1,159
  • 2
  • 14
  • 29
  • Thanks JD: although not specifically what I was asking about I can see that this has a lot of relevance to the assumptions people make about BBWC. It does resonate with a lot of the discussion about hardware vs software RAID, I think I should pont out for posterity that software RAID does not *preclude* the use of a caching controler (volatile or otherwise). – symcbean Mar 13 '14 at 15:00
  • IME, Dell and HP raid cards will complain (assuming you have a proper monitoring system) about failed batteries in a BBWC. – mfinni Mar 14 '14 at 01:27
  • Proper procedures for BBWC **must** include battery testing - for example, 3ware controllers will warn you if the battery has not been tested for some amount of time, and it's easy to test that the battery is still healthy (the only downside is that the write cache is disabled during the test). – iustin Dec 04 '14 at 13:58
4

This seems to necessitate a second answer to the question...

I just had a standalone VMware ESXi host lose a drive in a RAID 5 array. The degraded array impacted performance at the VM and application level.

Smart Array P410i in Slot 0 (Embedded)    (sn: 5001438011138950)

   array A (SAS, Unused Space: 0  MB)

      logicaldrive 1 (1.6 TB, RAID 5, Recovering, 42% complete)

      physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 300 GB, OK)
      physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS, 300 GB, Rebuilding)
      physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SAS, 300 GB, OK)
      physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SAS, 300 GB, OK)
      physicaldrive 2I:1:5 (port 2I:box 1:bay 5, SAS, 300 GB, OK)
      physicaldrive 2I:1:6 (port 2I:box 1:bay 6, SAS, 300 GB, OK)
      physicaldrive 2I:1:7 (port 2I:box 1:bay 7, SAS, 300 GB, OK)
      physicaldrive 2I:1:8 (port 2I:box 1:bay 8, SAS, 300 GB, OK, spare)

The IT person at this firm was not aware that a drive failed and hard reset the server (to make it all better?).

The interesting effect of doing this to a compromised array with busy virtual machines running atop was this:

Cache Status Details: The current array controller had valid data stored in its battery/capacitor backed write cache the last time it was reset or was powered up. This indicates that the system may not have been shut down gracefully. The array controller has automatically written, or has attempted to write, this data to the drives. This message will continue to be displayed until the next reset or power-cycle of the array controller.

So even though the system was halted abruptly, the in-flight data was protected by the BBWC. The virtual machines all recovered properly and the system is in good shape now.

ewwhite
  • 194,921
  • 91
  • 434
  • 799
3

In addition to "saving your data", they are good for other things. They are also good at buffering writes (in the cache) so as to improve performance of the IO subsystem by keeping the disk-write-queue low. This is particularly important for servers where interactive performance is paramount - for example, Citrix XenApp or Windows Terminal Services.

This is less important for a webserver, or a file server. You might not notice, or even be used to, a little lag. However, when you click on an icon in an Office application, you expect responsiveness. And so does your CEO.

user9517
  • 114,104
  • 20
  • 206
  • 289
mfinni
  • 35,711
  • 3
  • 50
  • 86
  • "I'm familiar with what a BBWC (Battery-backed write cache) is intended to do" – symcbean Mar 12 '14 at 13:10
  • 2
    You also said : " I'm curious to understand whether it actually offers any real benefit in practice." I gave you (and future readers) a concrete one. From your question, it was not at all clear that you knew about this benefit. And my answer is not wrong. – mfinni Mar 12 '14 at 13:11
  • So how do the points you made differ from a volatile write cache? – symcbean Mar 12 '14 at 13:16
  • Obviously that *was* the feature you were aware of. But again, you didn't make that clear. @mfinni is just being helpful. – deed02392 Mar 12 '14 at 13:56
  • Some systems won't allow you to enable a volatile write cache, so there's that. But no, if you don't care about the data and you can use a volatile write cache, then go for it. – mfinni Mar 12 '14 at 14:07