5

I heard that even a journaled filesystems such as EXT3/EXT4 might corrupted during power failure, e.g. from wikipedia [1]:

In the event of a system crash or power failure, 
such file systems are quicker to bring back online and 
less likely to become corrupted.

Can anyone provide more detail by giving examples such that when

  1. corruption can occur
  2. corruption is avoided by journaled filesystems

[1] http://en.wikipedia.org/wiki/Journaling_file_system

Ryan
  • 5,341
  • 21
  • 71
  • 87
  • When the data is not ideally syncing to disk due to drive cache, raid cache, os cache, depends on how many you have enabled, and how your server software is doing this, you might eventually lost the complete file system, have corrupted, or empty files after a power loss. – Andrew Smith Jul 01 '12 at 16:39
  • @AndrewSmith, the journal handles all of these properly, with the obvious exception of drives that lie to the OS and claim they have flushed data to disk when they have not. – psusi Jul 02 '12 at 02:41
  • @psusi OS can lie, drive can lie and RAID as well. The OS when you enable extra buffering (windows) or disable sync (linux), RAID when you dont have battery and enable write cashing. – Andrew Smith Jul 02 '12 at 10:00
  • @AndrewSmith, you can have the drive write cache enabled without it lieing to the OS. It is allowed to cache writes where it doesn't matter, then the journal flushes the cache when needed. It is the flushing of the cache that the drive must not lie to the OS about. If by "disable sync" you mean using `eatmydata`, that only lies to the application; the journal still keeps the fs clean. – psusi Jul 02 '12 at 13:32

3 Answers3

13

Corruption can also occur on most modern disks due to in-disk re-ordering.

Modern disks typically do re-ordering of requests that are used to speed up performance (by re-ordering writes to make the entire list of requests less seeky), this is called Tagged Command Queueing.

It is possible the write to the journal on the disk is delayed because its more efficient from the head position currently to write in a different order to the one the operating system requested as the actual order, meaning blocks can be committed before the journal is.

The way to resolve this is to make the operating system explicitly wait for the journal to have been committed before committing any more writes. This is known as a barrier. Most filesystems do not use this by default and would explicitly need enabling with a mount option.

mount -o barrier=1 /dev/sda /mntpnt

The big downside to barriers is they have a tendency to slow I/O down, sometimes dramatically (around 30%) which is why they arent enabled by default. In addition to this, things become doubleplusungood when you start to add logical layering on top of standard disks like LVM or Raid. LVM (relatively recently) added barrier support for most LV configurations and mdadm seems to have had it for a little while.

Matthew Ife
  • 22,927
  • 2
  • 54
  • 71
  • "Tagged Command Queueing" sounds a lot like defragmentation. – Gaia Jul 02 '12 at 00:19
  • 2
    @Gaia Defragmentation occurs when contiguous data at the file layer is non-contiguous on the block layer. The order of your data on disk and the order that you put data to the disk are two different things. – Matthew Ife Jul 02 '12 at 00:28
  • 1
    On ext[34], barriers are only needed if you switch to data=writeback mode. In the default data=ordered mode, the journal writes are completed before the regular writes are issued. The drive never sees the latter until after it has finished the former, so it has no chance to reorder them. Also barriers are enabled by default on ext4. – psusi Jul 02 '12 at 02:46
  • But the checksum in ext4 makes the late barrier somewhat redundant, and bariers have no effect on older versions of LVM (around 2.6.33) – symcbean Jul 02 '12 at 08:39
  • @psusi I dont think that is correct. Most requests to disks are done in batches, and a stream of commands are sent to the disk controller at once, rather than command/wait, command/wait, command/wait (this is the point of fsync()). Else disk latencies would be too slow. And yes, barriers are enabled by default on ext4. – Matthew Ife Jul 02 '12 at 09:15
  • 1
    Most, yes. A batch of journal writes may be sent to the disk together, but they must all finish before the fs then turns and issues the batch of real writes. You can think of data=ordered mode as "fsync the journal before the real writes". – psusi Jul 02 '12 at 13:35
  • 1
    Is it normal to use the `barrier=1` option? – Ryan Jul 06 '12 at 02:56
  • 1
    Normal for me. EXT4 turns it on too. EXT3 does not. – Matthew Ife Jul 06 '12 at 18:32
2

Most journaled file systems (ext3/4, ntfs) only transactionally protect the meta data. If a power outage occurs, user data could be rendered inconsistent but the meta data is fine.

Zfs and I think xfs protect both meta data and user data using transactions and logs.

longneck
  • 22,793
  • 4
  • 50
  • 84
  • 2
    Note: No file system can guarantee user data integrity. The application itself has to do this. ZFS and XFS do ensure atomic user data writes (whatever is requested to be written is, or the previous data will be returned, nothing in between); however if the app uses two writes, it's possible those are split before and after power failure, creating user data corruption even though the file is "consistent". – Chris S Jul 02 '12 at 00:43
  • 1
    No: XFS is a meta-data journalling filesystem, ext4 effectively has modes of operation giving better data guarantees than simlpy meta-data journalling. Running ZFS on Linux machine is not really the greatest idea ever - at best, the performance sucks. There are other full journalling solutions for Linux - NILFS2 and BTRFS. If you want really good data integrity then BTRFS + BBU controller + RAID is, IMHO, the route to go. – symcbean Jul 02 '12 at 08:42
1

Barrier is a way to avoid corruption on power outages, but this safety feature comes with a performance hit. Best of both worlds (performance/barrier=off WITH pratically no risk of corruption) costs a little bit more: use devices with non-volatile, battery-backed write caches.

Gaia
  • 1,777
  • 4
  • 32
  • 58
  • Barries prevent meta-data corruption, not user data corruption. And the Ext file systems have no way of detecting user data corruption. – Chris S Jul 02 '12 at 00:46