Finding files with BTRFS Uncorrectable Errors

17

6

I have a question concerning unrecoverable errors on a BTRFS file system. Specifically, I've run a BTRFS Scrub recently after experiencing a problem with one of my RAM sticks and it seems to have discovered 4 uncorrectable errors. This is the output:

scrub status for <UUID>
    scrub started at Thu Dec 25 15:19:22 2014 and was aborted after 89882 seconds
    total bytes scrubbed: 1.87TiB with 4 errors
    error details: csum=4
    corrected errors: 0, uncorrectable errors: 4, unverified errors: 0

Luckily I have everything backed up in a tertiary backup so I am not particularly concerned about losing the files (I'm well aware of the issues associated with the experimental status of BTRFS, I have multiple backups to keep my data safe, and determined to continue using it so please no: "Solution; don't use BTRFS" posts).

I would like to know, however, how to determine which files are associated with the uncorrectable errors? I want to find them, delete them, and replace them with their backed up copies.

If anyone has information on how to do this, I would love to hear from you.

Thank you in advance.

RedHack

Posted 2014-12-29T20:42:52.630

Reputation: 251

Answers

9

I have found the following method useful...

btrfs scrub the volume.

You will be presented with any number of csum errors as you've shown above.
Using your example error details: csum=4 . Use that number in the tail directive of the following statement:

dmesg | grep "checksum error at" | tail -4 | cut -d\  -f24- | sed 's/.$//'

It is handy to pipe this out to a file (e.g. > csums.txt)

I've tried a number of the suggested inode search approaches and they've all met with limited if any success.

Mark

Posted 2014-12-29T20:42:52.630

Reputation: 211

as far as I understand, you are using tail to limit the number of lines displayed and to ignore duplicates. I would recommend using sort | uniq to get rid of the duplicates like so: dmesg | grep "checksum error at" | cut -d\ -f24- | sed 's/.$//' | sort | uniq – niklasfi – 2018-09-02T20:21:02.160

3

Yes, mapping from INODE or Block Number back to a filename can be difficult. If you are really interested, you can try something like this and see which file files to copy...afterall if the file is bad it should throw an error during the copy. I have previously used this type of technique.

 find /mount-point -type f -exec cp {} /dev/null \;

 where mount-point is the ROOT node/mount-point of the affected filesystem

mdpc

Posted 2014-12-29T20:42:52.630

Reputation: 4 176

Running it now, hopefully it will turn something up.

Thank you for your advice, I will update you as to the result. – RedHack – 2014-12-29T22:00:14.627

1Sorry to say it does not seem to work =/ it found the first file causing the uncorrectable error, but then it spams the message: "stale file handle" to the terminal unless I terminate it.

Granted it found the file, but now I cannot figure out how to get rid of it. Gonna have to contact the BTRFS mailing list. – RedHack – 2014-12-30T20:04:13.813

You can move it to a special directory and then exclude it from a further search. – mdpc – 2014-12-30T20:12:29.410

1It won't move or copy, it just keeps telling me that the file handle is stale. I cannot even ls. – RedHack – 2014-12-30T22:04:33.900

If you use cp -v, you may also monitor the progress: find / -type f -exec cp -v {} /dev/null \; 2> corrupted-files.txt. However, /proc/kcore file might be huge (mine was 128TB) so the copy operation will likely hang. Since /proc directory contains special magical files, we don't need to check for them. Exclude the /proc directory: sudo find / -type f -and -not -path /proc -exec cp -v {} /dev/null \; 2> corrupted-files.txt – ceremcem – 2019-12-12T13:23:24.400

2

dmesg will give you details about the files involved in the uncorrectable checksum errors. The messages typically look like this: "BTRFS: checksum error at logical [...] on dev [...], sector [...], root [...], inode [...], offset [...], length [...], links [...] (path: [...])"; the last piece of information is the absolute path to the file that's corrupted.

arrrr

Posted 2014-12-29T20:42:52.630

Reputation: 21

1

I came here looking for the "Uncorrectable error" from BTRFS too. The above grep didn't work for me; I had to use instead:

$ dmesg | sed -n -r 's#.*BTRFS.*i/o error.*path: (.*)\)#\1#p' | sort -u
somepath/somefile.txt

Note how the path is relative to the start of the subvolume - no indication of which subvolume it's in. This luckily wasn't a problem for me.

crusaderky

Posted 2014-12-29T20:42:52.630

Reputation: 111

What is somepath/somefile.txt?  It looks like you are typing it as a separate command — or is it the output from the command you typed?  If it is all supposed to be one command line, please don’t break command lines apart for display purposes — just put it into the answer as one long line.  But, what is it?  Are you providing two inputs to sort (a pipe and a file)?  Or is somepath/somefile.txt meant to be an output file?  (It’s not very helpful to specify output files, unless they are intermediate files that you are using again.  People know how to handle results; e.g., by piping.) – Scott – 2018-11-10T17:53:59.157

Does this answer the original question? I can't tell. – I say Reinstate Monica – 2018-11-10T17:55:26.427

@TwistyImpersonator Well, it’s (IMO) clearly meant to be an alternative to Mark’s answer, and that got eight votes (and is an expansion of arrrr’s answer).

– Scott – 2018-11-10T19:28:18.790

1@Scott the second line was a sample output of the command. – crusaderky – 2018-11-11T18:21:20.550