0

There are many articles about how ZFS protects the data and detects bit rots etc. There is even a question asking about ZFS on a single drive. zfs on a single device: what happens when a file is corrupted? But the answer is use copies=2. What I am wondering is what exactly happens if one have no redundancy.

However I could not really find what happens exactly if there is no redundant copy of that data. People who spread FUD seem to imply that a single bit fault will cause all data to be lost. However it seems unlikely that ZFS is so badly designed.

So what really happens from the point of OS when ZFS finds a bit rot and there are no redundant copies of data? Does the OS show IO Error when accessing that specific part of filesystem/file?

Perhaps rest of the things continue working? If the part with error is overwritten then error disappears? or ?

Is there any clear documentation about these failure modes that you can point to?

Thanks!

yurtesen
  • 179
  • 2
  • 9

1 Answers1

1

In the case of data corruption, ZFS will report the fact of the corruption and the affected files in zpool status poolname -v. All files in the pool are not lost.

In the case that more than one copy is available, ZFS will return the good data to the OS when the file is read, but repair is still necessary: it can be recovered with zpool scrub poolname and the error manually cleared with zpool clear poolname.

If no other copy is available, then ZFS will return the corrupt data to the OS when the file is read. Recovery consists of reading what you can of the file (optional), deleting the file and clearing the error with zpool clear poolname.

Note that pool scrubbing should be done as a routine scheduled task, in part to minimize the impact of such errors, and the pool status should be monitored by your monitoring system.

The Oracle ZFS documentation on repairing corrupted ZFS data may be helpful to you here. ZFS on Linux and on FreeBSD work virtually identically.

Michael Hampton
  • 237,123
  • 42
  • 477
  • 940
  • So if no copy is available, ZFS will silently return the corrupt data? Because in the link you provided it says corrupted data is not accessible. It is not clear to me if it is literally inaccessible or just it means you get incorrect data when you access it... – yurtesen Feb 23 '21 at 17:56
  • @yurtesen You're right, it might return an I/O error. It's been a very long time since I've had one of these personally. And there's no good way to simulate this. Of course, they could just mean you can't have the original data because it's corrupt, not because it returns an error. – Michael Hampton Feb 23 '21 at 18:57
  • I actually found out that one can disable checksum checking and clear the error using `zfs online` as well. So I guess it would be possible to make ZFS work just like any other FS, ignoring checksum error. Although documentation said if metadata is corrupt, the pool may become inaccessible. I will mark your answer is correct. I learned something reading your answer and it was very useful. – yurtesen Feb 24 '21 at 15:21
  • Yes, metadata corruption is a whole other beast, and if that happens you're probably screwed. Either way, backups are still important. – Michael Hampton Feb 24 '21 at 15:29