Proper way to deal with corrupt XFS filesystems

Question

I recently had an XFS filesystem become corrupt due to a powerfail. (CentOS 7 system). The system wouldn't boot properly.

I booted from a rescue cd and tried xfs_repair, it told me to mount the partition to deal with the log.

I mounted the partition, and did an ls to verify that yes, it appears to be there. I unmounted the partition and tried xfs_repair again and got the same message.

What am I supposed to do in this situation? Is there something wrong with my rescue cd (System Rescue CD, version 4.7.1)? Is there some other procedure I should have used?

I ended up simply restoring the system from backups (it was quick and easy in this case), but I'd like to know what to do in the future.

score 27 · Accepted Answer · edited Feb 13 '19 at 16:07

If you're attempting to run xfs_repair, getting the error message that suggests mounting the filesystem to replay the log, and after mounting still receiving the same error message, you may need to perform a forced repair (using the -L flag with xfs_repair). This option should be a last resort.

For example, I'll use a case where I had a corrupt root partition on my CentOS 7 install. When attempting to mount the partition, I continually received the below error message:

mount: mount /dev/mapper/centos-root on /mnt/centos-root failed: Structure needs cleaning

Unfortunately, forcing a repair would involve zeroing out (destroying) the log before attempting a repair. When using this method, there is a potential of ending up with more corrupt data than initially anticipated; however, we can use the appropriate xfs tools to see what kind of damage may be caused before making any permanent changes.

Using xfs_metadump and xfs_mdrestore, you can create a metadata image of the affected partition and perform the forced repair on the image rather than the partition itself. The benefits of this is the ability to see the damage that comes with a forced repair before performing it on the partition.

To do this, you'll need a decent sized USB or external hard drive. Start by mounting the USB drive - my USB was located at /dev/sdb1, yours may be named differently.

mkdir -p /mnt/usb
mount /dev/sdb1 /mnt/usb

Once mounted, run xfs_metadump to create a copy of the partition metadata to the USB - again, your affected partition may be different. In this case, I had a corrupt root partition located at /dev/mapper/centos-root:

xfs_metadump /dev/mapper/centos-root /mnt/usb/centos-root.metadump

Next, you'll want to restore the metadata in to an image so that we can perform a repair and measure the damage.

xfs_mdrestore /mnt/usb/centos-root.metadump /mnt/usb/centos-root.img

I found that in rescue mode xfs_mdrestore is not available, and instead you'll need to be in rescue mode of a live CentOS CD.

Finally, we can perform the repair on the image:

xfs_repair -L /mnt/usb/centos-root.img

After the repair has completed and you've assessed the output and potential damage, you can determine as to whether you'd like to perform the repair against the partition.

To run the repair against the partition, simply run:

xfs_repair -L /dev/mapper/centos-root

Don't forget to check the other partitions for corruption as well. After the repairs, reboot the system and you should be able to successfully boot.

Remember that the -L flag should be used as a last resort where there are no other possible options to repair.

I found that these online articles helped:

OK, so -L is a last resort, and these are EXCELLENT instructions on how to see just how bad it's going to be if we use -L. What other options do I have short of using -L? — Michael Kohne, May 18 '16 at 11:23
@MichaelKohne Restoring from backup, of course. You shouldn't get anywhere near this level of hell unless you haven't got backups. — Michael Hampton, May 19 '16 at 07:31
@MichaelHampton - OK, fair enough. But I don't think I've EVER lost a filesystem like this to ext4 errors on powerfail - is xfs less resiliant? Or did I just have really bad luck this time? — Michael Kohne, May 19 '16 at 11:36
@MichaelKohne I think you just got extraordinarily unlucky. XFS is quite a reliable filesystem. — Michael Hampton, May 19 '16 at 11:40
we used to simply be able to do that from initrd. what wonderful "progress" we made. — Florian Heigl, Oct 13 '16 at 22:54
I think I know the answer but just in case I'm wrong: is the image created by xfs_mdrestore of the entire volume? — pufferfish, Feb 28 '18 at 11:40
@pufferfish Do you mean `xfs_metadump`? Depends. In short, yes - the generated image contains all metadata and block indexes. If you're questioning as to whether it creates a copy of the volume including the block data, or file contents, then no. You can use `xfs_metadump` to run diagnostics, volume checks, `xfs_repair`, etc without touching the actual source volume itself, which is useful in inspecting what potential damage you're looking at when running `xfs_repair`. `xfs_mdrestore` is what can be used to write that metadata back to a volume (either virtual, or physical). — brendonofficial, Mar 01 '18 at 04:05
Ah, ok that's good news. I thought the above technique required another storage space as large as the one I was trying to repair — pufferfish, Mar 01 '18 at 12:03

score 2 · Answer 2 · edited Jul 02 '21 at 14:32

I had this error whe centos 7 bad stop inside a kvm virtual-machine:

# metadata corruption detected at xfs...

when I use the log wiht journalctl -xe, I found an error mounting:

# /dev/mapper/root /sysroot

I solve it using:

# xfs_repair /dev/mapper/root

Then the system complete the seven phases and then y reboot using

# ./shutdown

And then the virtual machine centos 7 work well…

Regards

Note: maybe your /dev/mapper/root has an other name, please watch your error log with journalctl -xe to find the name of your unit bad mounted

Proper way to deal with corrupt XFS filesystems

2 Answers2