1

So I just swapped out a RAID HDD on a server. I got the "hasn't been checked for X days, fsck forced" message, and am patiently waiting right now.

This got me wondering: how can I responsibly avoid this situation? I know that I could skip the forced fsck, and that I can't really do a real fsck while the system is running.

So is there a way to do an online fsck on a server in the night, just to check if there are any potential problems? And if there aren't, make the system NOT running the forced fsck?

Would an online read-only fsck detect (not fix) the same problems a full-blown fsck would?

This is one ext3.

4 Answers4

2

Switch to a more modern journaled filesystem such as XFS or ext4. In these systems a full fsck isn't necessary if the filesystem is unclean; the journal is just replayed, which takes a second or so. Even if a full fsck is forced, ext4 is significantly faster at fsck than ext3.

You're going to need a Linux system from the last several years that supports ext4. In particular, kernel version 2.6.24 or higher. It sounds like your system is pretty ancient, so it might not even have support for ext4. If that's the case, it's almost certainly far past end of life anyway...

(Note: While ext3 is journaled it is missing several optimizations present in ext4 which cause fsck to run much faster.)

Michael Hampton
  • 237,123
  • 42
  • 477
  • 940
  • 3
    ext3 is a journaled file system. – kasperd Jan 21 '18 at 00:36
  • They already had some days of mounting with just a journal replay. An occasional full check is still recommended by the tune2fs man page. – John Mahowald Jan 21 '18 at 13:00
  • As kasperd said: ext3 is journaled, and working well. IIRC, there were some initial problems with ext4, so i stuck to ext3 when I installed that server years ago. AFAIK, ext4 would also force a fsck if it detects that a filesystem hasn't been checked in a while? – Moritz von Schweinitz Jan 22 '18 at 20:04
  • 1
    @MoritzvonSchweinitz You're right. But ext4 is significantly faster at fsck than ext3. And I had thought that was because of journaling; it was actually something else. I've updated this answer accordingly, and I have to even more strongly recommend you switch to ext4 if you can. (And decommission those ancient beasts, but...) – Michael Hampton Jan 23 '18 at 16:21
2

You can use e2croncheck (Debian Bugreport 773267) if you have your fs on lvm.

  • Thank you - this seems to be the 'official' way of doing pre-emptive fsck on a used filesystem. I always try to avoid the additional abstraction layer of LVM, but I guess it's time to change that. Although I do think it's strange that there's no way to do a basic sanity-check on a live filesystem. :-( – Moritz von Schweinitz Jan 22 '18 at 20:01
1

You can modify whether the system will force fsck's on reboot in a few different ways:

  • Transient :: grub.conf

    Add fastboot to your grub.conf file at the end of your kernel line

  • Permanent :: fstab

    In the fstab entry for your mount, the last column (one of two numbered columns) can be switched to a 0. According to the fstab manpage:

    The sixth field (fs_passno).
          This field is used by fsck(8) to determine the order in which filesystem  checks  are  done  at  boot
          time.   The root filesystem should be specified with a fs_passno of 1.  Other filesystems should have
          a fs_passno of 2.  Filesystems within a drive will be checked sequentially, but filesystems  on  dif‐
          ferent  drives  will  be  checked  at the same time to utilize parallelism available in the hardware.
          Defaults to zero (don't fsck) if not present.
    

As for checking a live/mounted filesystem, some articles state that you can try by setting everything to read-only (mounting the filesystem in question read-only and running fsck as read-only as well), though most of those articles also recommend against the practice mainly due to the unreliability of the results.

I found this suggestion that mentions a clever trick you might try if you happen to be using LVM for your partitions. However, if you don't happen to have such a luxury, you will most likely want to either A: let the fscks run their course (usually recommended); or B: plan downtime during that overnight period you mentioned where you run an offline fsck. Also, as Michael Hampton mentioned, it might be time to consider something a bit newer than ext3 :D

Adam V
  • 178
  • 1
  • 10
  • I was aware of being able to change the forced fsck interval - but I guess the fs maintainers knew what they were doing when they set a default. my question was more about whether it's possible to run a basic sanity check on an online filesystem in order to know if there's a problem, and then fix it offline. Mario linked to a script from Theodore Ts'o that seems to do the LVM thing. – Moritz von Schweinitz Jan 22 '18 at 20:07
  • 1
    Ah, ok. Yeah, the script he linked looks to basically be automating the idea that was pitched in the link I provided - create a snapshot, fsck the snapshot, remove the snapshot, review results. A neat idea to be sure if you have LVM set up. I otherwise know of no filesystems that allow for (supported) live fsck-ing, though I'd assume you'd be rebooting at least once a year for updates and such, so you could just fsck during that maintenance window also. Hope that script works for you! – Adam V Jan 23 '18 at 13:41
0

One responsible thing to do would be to be sure a backup exists that you have mounted read write and run e2fsck on. Such as an LVM snapshot that is archived to long term media. (Be sure to do a backup restore test sometime!)

If the fsck of the backup copy came back clean, and is new enough to meet your recovery time objectives, then you have some confidence in not doing the forced fsck. If it came back dirty, then ideally it marks the primary volume for a full fsck, like e2croncheck does, see Mario's answer.

It still is a good idea to do time based fsck at least once or twice a year. Use tune2fs -i to set some number of days under your typical reboot frequency. You need to reboot anyway for updates to take effect, allow enough time during planned maintenance for a fsck.

John Mahowald
  • 30,009
  • 1
  • 17
  • 32