Do ext3/4 file systems accumulate errors naturally (as reported by fsck)?

2

I run a number of CentOS 6 64bit servers with ext3/ext4 file systems. As far as I can tell, none of them have been shutdown improperly, but all of them have accumulated some file system errors that fsck now reports.

Now, a few drives (not file systems) have IO errors which are going to lead to hard drive failures (we run raid1) so is that leading to file system errors? I wouldn't think those errors would be allowed to get up to the file system?

At least one doesn't show any signs of hard drive failure but has fsck errors.

So, do ext3/4 file systems accumulate errors naturally over time or is something bad going on?

Shovas

Posted 2017-01-16T16:09:17.257

Reputation: 63

Why would you think a I/O error wouldn't interact with a file system error - if the I/O error is reading the file, what do you think the file system will do? - it's going to error if it can't read the file. No matter the cause. – djsmiley2k TMW – 2017-01-16T16:16:56.923

Without more details it's difficult to say what happened exactly. ext3 is quite mature, I haven't seen any actual FS accumulating errors naturally over use in years. Unrecoverable I/O errors (unlikely for RAID 1) will lead to FS errors if they happen inside the FS structure. If RAID 1 somehow screws up error recovery (don't have personal experience with that), that also could lead to FS errors. I'd look closely at which blocks had errors, how raid behaved, and which blocks lead to FS errors. – dirkt – 2017-01-16T16:19:37.633

Thanks for the replies, @djsmiley2k, @dirkt. The IO errors reported by dmesg are at the device level, and only on one device, so I figured raid1 would do the right thing from the good device. Also, at least one server doesn't have any drive errors but does have file system errors. – Shovas – 2017-01-16T16:39:24.657

So I presume you're using mdadm or some software raid, not hardware raid? – djsmiley2k TMW – 2017-01-16T16:43:08.660

@djsmiley2k Yes, mdadm software raid1 mirror. – Shovas – 2017-01-16T16:50:33.740

Answers

2

File system errors do not cause I/O errors which do not cause Hard Drive Failures. In fact, you have the causality completely reversed. Hard Drive failures cause I/O errors, which in turn lead to file system corruptions.

I/O errors will be reported as errors to user space. In some cases it may cause file system corruptions (which can be fixed by fsck), but in some cases it may only result in data block corruptions.

So in general, it is not "normal" for file system corruptions to collect in ext3/ext4 file systems. That generally means you have some kind of hardware problem. It could be a memory problem; or hard drive failures; etc. In fact if you are seeing I/O errors, you need to fix them first. Software bugs in general do not cause hardware failures!

Theodore Ts'o

Posted 2017-01-16T16:09:17.257

Reputation: 401

Thank you for responding, @Theodore. I recognize your name from reading up on file systems :) I clarified my questions to be clear I wasn't thinking FS errors lead to drive failures. I meant would drive errors lead to FS errors in an mdadm raid1 setup where one drive is good? Definitely need to get those bad drives replaced but in real-world dedicated server hosting (ie. 1and1.com) they don't seem eager to replace drives for mirrors that are still intact :/. – Shovas – 2017-01-18T19:49:47.233

Marking as answer for confirming that physical device IO errors can lead to FS errors: "I/O errors will be reported as errors to user space. In some cases it may cause file system corruptions (which can be fixed by fsck), but in some cases it may only result in data block corruptions." I must have been hoping for more of an answer at that time but this answers the question. Thanks – Shovas – 2018-03-25T16:43:18.630

0

Ext3 is a completly reliable filesystem, which is not true for Ext4 (more depending on Kernel)

However, some errors can be made from loose data cables/connectors, or even vibrations/shocks made to the hard drive (hitting the PC case with your feet, moving your laptop, etc)

X.LINK

Posted 2017-01-16T16:09:17.257

Reputation: 1 935

4How many bugs are in a particular file system codebase is going to be dependent on the kernel version, but in general ext4 is just as reliable, if not more reliable, than ext3. In fact when we put ext4 into production use in Google, the fact that it was running on so many machines, and we could look for correlated failures, meant that we found and fixed a bug that was in ext3; but it was so rare that it survived multiple enterprise Linux certification test processes. (It almost certainly triggered on ext3, but it was probably written off as a hardware failure.) – Theodore Ts'o – 2017-01-17T04:49:17.937

Well, that's an unexpected answer since you're the ext3 maintainer and one of the ext4 creator... On the other side, that would certainly be the same for ext4, there always will be bugs that could take years to spot while they don't now for any software... But despite informing myself a lot on linux world for years, how come didn't I -and also a lot of people on the internet- got aware about the solving of ext4's main problem back in 2.6.30 kernel ?!? Anyway, I'll still stick to ext3 because of its maturity and will probably switch to ext 4 when people will jump to btrfs... – X.LINK – 2017-01-17T08:53:38.777