33

I scrubbed my pool today, and after the scrub finished, I noticed there was an error that corrupted a file. I didn't care about the file, so I deleted it. Unfortunately, the error remains (now referenced by a hex ID and not a filename), and I don't know how to clear it.

  • Should I be worried? Am I not really free of this error just yet?
  • Can I clear the error? If the file is gone, I don't really want to see this error in the future.

For reference, here are the commands I issued and the output, with annotations:

Checking status

kevin@atlas:~$ sudo zpool status -v

pool: zstorage
state: ONLINE
status: One or more devices has experienced an error resulting in data
    corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
    entire pool from backup.
see: zfsonlinux.org/msg/ZFS-8000-8A
scan: scrub repaired 1.81M in 7h19m with 1 errors on Wed Feb 19 10:04:44 2014
config:

    NAME                                          STATE     READ WRITE CKSUM
    zstorage                                      ONLINE       0     0     0
    raidz1-0                                    ONLINE       0     0     0
        ata-WDC_WD30EZRX-00DC0B0_WD-WCC1T1735698  ONLINE       0     0     0
        ata-WDC_WD30EZRX-00DC0B0_WD-WMC1T0506289  ONLINE       0     0     0
        ata-WDC_WD30EZRX-00MMMB0_WD-WCAWZ2711600  ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        /zstorage/owncloud/kevin/files/Archives/Music/Kev Rev 7/graveyard/Old/Four Tet/Pause/03 Harmony One.mp3

Switching to root and deleting the file - I don't need it

kevin@atlas:~$ sudo -i

root@atlas:~# cd /zstorage/owncloud/kevin/files/Archives/Music/Kev\ Rev\ 7/graveyard/Old/Four\ Tet/Pause/

root@atlas:/zstorage/owncloud/kevin/files/Archives/Music/Kev Rev 7/graveyard/Old/Four Tet/Pause# rm 03\ Harmony\ One.mp3

Checking status again

root@atlas:/zstorage/owncloud/kevin/files/Archives/Music/Kev Rev 7/graveyard/Old/Four Tet/Pause# zpool status -v

pool: zstorage
state: ONLINE
status: One or more devices has experienced an error resulting in data
    corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
    entire pool from backup.
see: zfsonlinux.org/msg/ZFS-8000-8A
scan: scrub repaired 1.81M in 7h19m with 1 errors on Wed Feb 19 10:04:44 2014
config:

    NAME                                          STATE     READ WRITE CKSUM
    zstorage                                      ONLINE       0     0     1
    raidz1-0                                    ONLINE       0     0     2
        ata-WDC_WD30EZRX-00DC0B0_WD-WCC1T1735698  ONLINE       0     0     0
        ata-WDC_WD30EZRX-00DC0B0_WD-WMC1T0506289  ONLINE       0     0     0
        ata-WDC_WD30EZRX-00MMMB0_WD-WCAWZ2711600  ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        zstorage:<0x9f115>

Uh oh. Maybe I can clear the error?

root@atlas:/zstorage/owncloud/kevin/files/Archives/Music/Kev Rev 7/graveyard/Old/Four Tet/Pause# zpool clear zstorage

root@atlas:/zstorage/owncloud/kevin/files/Archives/Music/Kev Rev 7/graveyard/Old/Four Tet/Pause# zpool status -v

pool: zstorage
state: ONLINE
status: One or more devices has experienced an error resulting in data
    corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
    entire pool from backup.
see: zfsonlinux.org/msg/ZFS-8000-8A
scan: scrub repaired 1.81M in 7h19m with 1 errors on Wed Feb 19 10:04:44 2014
config:

    NAME                                          STATE     READ WRITE CKSUM
    zstorage                                      ONLINE       0     0     0
    raidz1-0                                    ONLINE       0     0     0
        ata-WDC_WD30EZRX-00DC0B0_WD-WCC1T1735698  ONLINE       0     0     0
        ata-WDC_WD30EZRX-00DC0B0_WD-WMC1T0506289  ONLINE       0     0     0
        ata-WDC_WD30EZRX-00MMMB0_WD-WCAWZ2711600  ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        zstorage:<0x9f115>

This doesn't look good!

Kevin Wood
  • 363
  • 1
  • 4
  • 8

2 Answers2

23

Scrub your pool again (if you haven't already):

zpool scrub zstorage

That error is telling you that inode <0x9f115> is corrupt (deleting the file broke the filename->inode mapping, so it's just reporting the inode now). Either something still has the file open or the metadata just needs to be cleaned up (which a scrub should do).

To clear the error if a scrub won't you need to get down and dirty with zdb, which is not publicly documented by oracle (and poorly documented elsewhere) - and at any rate probably indicates something more fundamentally wrong.

quadruplebucky
  • 5,041
  • 18
  • 23
14

I know I'm super late to the party, but just wanted to add that if the additional scrubs don't fix issues like this, instead of looking at zdb you can instead just start a scrub, let it run for a couple minutes, and then stop it with zpool scrub -s zstorage. That will worked for me at clearing permanent errors for files when when all the read/write/checksum errors were at zero.

http://unixetc.co.uk/2012/01/22/zfs-corruption-persists-in-unlinked-files/

EDIT: After having to do this a few times I also realized that the timing of how long you let the scrub run will affect whether it works (depending on what blocks it does looks at first). So if it doesn't work at first, try a few more times and adjust the timing of when you stop it.

4oo4
  • 173
  • 1
  • 10
  • For me this didn't work -- the permanent errors are still there, and I'll see them again if I do a full scrub. – William Stein Oct 26 '17 at 17:11
  • 5
    That did it for me. I had done a full scrub and the errors weren't cleared, but then I did the start->stop and it's clean. thanks. – Stu Jan 02 '18 at 13:26
  • Now it does seem to work for me (I'm using the latest version of ZFS on linux now). – William Stein Aug 02 '19 at 16:21
  • 1
    This does not work for me. Even though the errors are gone after `scrub -s`, they **reappear at the next full scrub**. Can you confirm whether you ever ran a full scrub after using your solution? – nh2 Oct 04 '20 at 01:15
  • @nh2 Yes, I do a monthly scrub and the errors didn't reappear. I stopped having the problem completely after finding some hardware issues with one of my drives. – 4oo4 Oct 25 '20 at 00:12