0

Files written within a minute of a power failure are zero bytes when the system comes back. Testing for a system that we can not guarantee will have ups backup.

Centos 6.4
kernel 2.6.32-358.14.1.el6.x86_64
ext4 mounted with defaults

See this occurring on 2 different systems (the only two that I have tried it on)

1st with PERC h710 controller RAID 6 with 4 3TB drives.
It happens with and without lvm
It happens in both write through and write back cache.

2nd with no RAID controller 2TB disk
only tested without lvm

Suggestions on how to prevent this?

  • 2
    And the question is? – Dennis Kaarsemaker Aug 05 '13 at 19:08
  • Is the cache battery-backed? If not, there's your reason. – Nathan C Aug 05 '13 at 19:26
  • Why RAID 6 with 4 drives? Is your goal to extend the array later? (Most people seem to go for [RAID 10](http://serverfault.com/questions/339128/what-are-the-different-widely-used-raid-levels-and-when-should-i-consider-them) when using four drives.) – Hennes Aug 05 '13 at 19:27
  • cache has battery. Also occurs in write through mode. 2nd system does not have controller with disk cache disabled. – Sean Leighton Aug 05 '13 at 19:27
  • raid 10 can still lose data when 2 drives fail. raid 6 should be able to recover no matter which 2 drives fail. – Sean Leighton Aug 05 '13 at 19:45
  • yeah but raid6 rebuilt time is huge on the big disks and it is its major weakness. While RAID6 rebuilding i would not suggest use that storage for intensive reads/writes while in rebuild, + advantage of Raid6 on 4 drive array almost none vs RAID10, 8 or more disks. – Danila Ladner Aug 05 '13 at 20:47
  • Did you also disable the write cache on the drives themselves? – toppledwagon Aug 05 '13 at 21:00
  • yes disabled cache on the drives. My initial statement may be incorrect. I stated that writes within the last minute would be affected but it looks like it is closer to the last 5-10 seconds. commit=5 is the default – Sean Leighton Aug 05 '13 at 21:36
  • Your *drives* are not losing data. Your *file system* is [losing the data](http://www.h-online.com/open/news/item/Ext4-data-loss-explanations-and-workarounds-740671.html) in an acceptable manner. – MikeyB Aug 06 '13 at 18:20

1 Answers1

1

Going by your description, this sounds like the OS hasn't flushed your data to disk, given that you do have the file-metadata, just without any content/length.

write(2) doesn't by any means guarantee that your data have hit disk (unless you open(2) your files with O_DIRECT), you're at the behest of the OS/filesystem and when it decides to flush data/metadata to disk. In order to guarantee that your data is safely tucked away, you have two options:

The big caveat in this is caches (which may or may not) be battery backed on both the drives and the raid-controllers, notably there are some drives which will put your writes on their internal ram cache and state they've written it, in which there exists some window of time where a power-loss would make you loose data.

Kjetil Joergensen
  • 5,854
  • 1
  • 26
  • 20