1

My datacenter says that each rack has primary and backup power on each rack. I assume this means there is a UPS for each server. Therefore, do I have any need of getting a BBU for the following setup?

Intel Cherry 520 SSD x 4 RAID 10 LSI-9260 with WRITEBACK CACHE ENABLED

I have heard that without a BBU the data in the cache could be lost. Since my needs aren't mission-critical, I can afford to lose some data. But would the rest of the data on the HD be corrupted?

ewwhite
  • 194,921
  • 91
  • 434
  • 799
user3180
  • 337
  • 1
  • 4
  • 13

3 Answers3

2

File systems write more than just data to HDDs; they also write metadata. The danger of data loss isn't so much that your most recent results file goes missing as that the metadata becomes corrupt, making the file system inconsistent and unmountable. Corrupt filesystems can lose much more data when they're fscked.

Normally one would choose a journalling filesystem to minimise the danger of this, but with write-cached RAID hardware this may not help, as the hardware has essentially lied to the OS about what has actually been written to disc (considering a write to the cache to be sufficient). Power loss means you may still end up with an inconsistent, and thus later a roached, file system.

No RAID array I know of considers unbacked write-cache to be a good idea, and most of them disable writeback cacheing if the cache battery goes flat. They may have a point.

MadHatter
  • 78,442
  • 20
  • 178
  • 229
2

Since my needs aren't mission-critical, I can afford to lose some data. But would the rest of the data on the HD be corrupted?

Enable filesystem barriers on all you're mounts, if you can afford to lose some data the maximum loss in this scenario would be your cache size, but on average it should be quite a bit less.

Note that barriers reduce I/O performance but improve the integrity of you're filesystem greatly -- especially when using disks that attempt to re-order writes.

From man 8 mount

   barrier=0 / barrier=1 / barrier / nobarrier
          This enables/disables the use of write barriers in the jbd code.
          barrier=0 disables, barrier=1 enables.  This also requires an IO
          stack which can support barriers, and if jbd gets an error on  a
          barrier write, it will disable again with a warning.  Write bar‐
          riers enforce proper on-disk ordering of journal commits, making
          volatile  disk  write  caches  safe  to use, at some performance
          penalty.  If  your  disks  are  battery-backed  in  one  way  or
          another, disabling barriers may safely improve performance.  The
          mount options "barrier" and "nobarrier"  can  also  be  used  to
          enable  or  disable  barriers,  for  consistency with other ext4
          mount options.

          The ext4 filesystem enables write barriers by default.

Theoretically, the journal would save you from filesystem corruption due to a sudden loss of power because metadata will be guaranteed to be well-ordered.

Matthew Ife
  • 22,927
  • 2
  • 54
  • 71
  • It will not; see my comment on writeback cacheing. – MadHatter Oct 19 '13 at 09:32
  • @MadHatter Yes it will, it sends a FUA (Force Unit Access) to the device in order to flush the write cache. The old implementation is documented in (http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/block/barrier.txt?id=09d60c701b64b509f328cac72970eb894f485b9e). This was scrapped and replaced with 'flush requests' in later kernels (2.6.33+). – Matthew Ife Oct 19 '13 at 09:47
  • I apologise and withdraw my objection; +1. The way some devices implement some of the more *outre'* commands (eg, some SSDs and the security command set, see eg http://www.usenix.org/events/fast11/tech/full_papers/Wei.pdf, s3.2.1) would make me a little nervous about relying on FUA if I wasn't sure the underlying hardware correctly honoured it; but it seems to me that you're right that it should work. – MadHatter Oct 19 '13 at 09:58
1

Datacenter power and battery is provisioned at the facility/room level. This is assuming you're in a commercial co-location facility...

So you DO have UPS protection on the A and B power feeds to your rack. The battery protection is done upstream from your rack.


Now, for your storage situation, you have SSDs running on a RAID controller. You typically don't need to use the caching functionality of a BBU in conjunction with solid-state drives. It's best to disable the read caching. If you do use the cache, set the ratio to favor writes. See this document for some detail. Benchmark for your specific case, but you may not need the cache for this setup.


Since you're using an LSI controller, the most optimal SSD performance solution is to leverage the LSI Fastpath software. This modification disables the legacy logic needed for spinning disks and optimizes the data paths for SSD-only arrays.

ewwhite
  • 194,921
  • 91
  • 434
  • 799