9

I have a ZFS pool made out of 6 RAIDZs. One of the RAIDZ is degraded, due to loosing two disks in the single RAIDZ close enough together that ZFS wasn't able to recover from the first failure before the second disk failed. Here is the output from "zpool status" shortly after reboot:

  pool: pod2
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: resilver in progress for 0h6m, 0.05% done, 237h17m to go
config:

        NAME                                                 STATE     READ WRITE CKSUM
        pod2                                                 DEGRADED     0     0 29.3K
          raidz1-0                                           ONLINE       0     0     0
            disk/by-id/scsi-SATA_ST3000DM001-9YN_W1F165XG    ONLINE       0     0     0
            disk/by-id/scsi-SATA_ST3000DM001-9YN_W1F1660X    ONLINE       0     0     0
            disk/by-id/scsi-SATA_ST3000DM001-9YN_W1F1678R    ONLINE       0     0     0
            disk/by-id/scsi-SATA_ST3000DM001-9YN_W1F1689F    ONLINE       0     0     0
            disk/by-id/scsi-SATA_ST3000DM001-9YN_W1F16AW9    ONLINE       0     0     0
          raidz1-1                                           ONLINE       0     0     0
            disk/by-id/scsi-SATA_ST3000DM001-9YN_W1F16C6E    ONLINE       0     0     0
            disk/by-id/scsi-SATA_ST3000DM001-9YN_W1F16C9F    ONLINE       0     0     0
            disk/by-id/scsi-SATA_ST3000DM001-9YN_W1F16FCD    ONLINE       0     0     0
            disk/by-id/scsi-SATA_ST3000DM001-9YN_W1F16JDQ    ONLINE       0     0     0
            disk/by-id/scsi-SATA_ST3000DM001-9YN_W1F17M6V    ONLINE       0     0     0
          raidz1-2                                           ONLINE       0     0     0
            disk/by-id/scsi-SATA_ST3000DM001-9YN_W1F17MSZ    ONLINE       0     0     0
            disk/by-id/scsi-SATA_ST3000DM001-9YN_W1F17MXE    ONLINE       0     0     0
            disk/by-id/scsi-SATA_ST3000DM001-9YN_W1F17XKB    ONLINE       0     0     0
            disk/by-id/scsi-SATA_ST3000DM001-9YN_W1F17XMW    ONLINE       0     0     0
            disk/by-id/scsi-SATA_ST3000DM001-9YN_W1F17ZHY    ONLINE       0     0     0
          raidz1-3                                           ONLINE       0     0     0
            disk/by-id/scsi-SATA_ST3000DM001-9YN_W1F18BM4    ONLINE       0     0     0
            disk/by-id/scsi-SATA_ST3000DM001-9YN_W1F18BRF    ONLINE       0     0     0
            disk/by-id/scsi-SATA_ST3000DM001-9YN_W1F18XLP    ONLINE       0     0     0
            disk/by-id/scsi-SATA_ST3000DM001-9YN_Z1F09880    ONLINE       0     0     0
            disk/by-id/scsi-SATA_ST3000DM001-9YN_Z1F098BE    ONLINE       0     0     0
          raidz1-4                                           DEGRADED     0     0 58.7K
            disk/by-id/scsi-SATA_ST3000DM001-9YN_Z1F09B0M    ONLINE       0     0     0
            spare-1                                          DEGRADED     0     0     0
              disk/by-id/scsi-SATA_ST3000DM001-9YN_Z1F09BEN  UNAVAIL      0     0     0  cannot open
              disk/by-id/scsi-SATA_ST3000DM001-1CH_W1F49M01  ONLINE       0     0     0  837K resilvered
            disk/by-id/scsi-SATA_ST3000DM001-9YN_Z1F0D6LC    ONLINE       0     0     0
            disk/by-id/scsi-SATA_ST3000DM001-9YN_Z1F0CWD1    ONLINE       0     0     0
            spare-4                                          DEGRADED     0     0     0
              disk/by-id/scsi-SATA_ST3000DM001-9YN_Z1F09C8G  UNAVAIL      0     0     0  cannot open
              disk/by-id/scsi-SATA_ST3000DM001-1CH_W1F4A7ZE  ONLINE       0     0     0  830K resilvered
          raidz1-5                                           ONLINE       0     0     0
            disk/by-id/scsi-SATA_ST3000DM001-1CH_Z1F2KNQP    ONLINE       0     0     0
            disk/by-id/scsi-SATA_ST3000DM001-9YN_Z1F0BML0    ONLINE       0     0     0
            disk/by-id/scsi-SATA_ST3000DM001-9YN_Z1F0BPV4    ONLINE       0     0     0
            disk/by-id/scsi-SATA_ST3000DM001-9YN_Z1F0BPZP    ONLINE       0     0     0
            disk/by-id/scsi-SATA_ST3000DM001-9YN_Z1F0BQ78    ONLINE       0     0     0
          raidz1-6                                           ONLINE       0     0     0
            disk/by-id/scsi-SATA_ST3000DM001-9YN_Z1F0BQ9G    ONLINE       0     0     0
            disk/by-id/scsi-SATA_ST3000DM001-9YN_Z1F0BQDF    ONLINE       0     0     0
            disk/by-id/scsi-SATA_ST3000DM001-9YN_Z1F0BQFQ    ONLINE       0     0     0
            disk/by-id/scsi-SATA_ST3000DM001-9YN_Z1F0CW1A    ONLINE       0     0     0
            disk/by-id/scsi-SATA_ST3000DM001-9YN_Z1F0BV7M    ONLINE       0     0     0
        spares
          disk/by-id/scsi-SATA_ST3000DM001-1CH_W1F49M01      INUSE     currently in use
          disk/by-id/scsi-SATA_ST3000DM001-1CH_W1F4A7ZE      INUSE     currently in use
          disk/by-id/scsi-SATA_ST3000DM001-1CH_W1F49MB1      AVAIL   
          disk/by-id/scsi-SATA_ST3000DM001-1ER_Z5001SS2      AVAIL   
          disk/by-id/scsi-SATA_ST3000DM001-1ER_Z5001R0F      AVAIL   

errors: 37062187 data errors, use '-v' for a list

When the first disk failed I replaced it with a hot spare and it began to resilver. Before the resilver completed, a second disk failed, so I replaced the second disk with another hot spare. Since then it will start to resilver, get about 50% done and then starts gobbling memory until it eats it all up and causes the OS to crash.

Upgrading the RAM on the server isn't a straightforward option at this point, and it's unclear to me that doing so would guarantee a solution. I understand that there will be data loss at this stage, but if I can sacrifice the contents of this one RAIDZ to preserve the rest of the pool that is a perfectly acceptable outcome. I am in the process of backing up the contents of this server to another server, but the memory consumption issue forces a reboot (or crash) every 48 hours or so, which interrupts my rsync backup, and restarting the rsync takes time (it can resume once it figures out where it left off, but that takes a very long time).

I think ZFS attempting to deal with two spare replacement operations is at the root of the memory consumption issue, so I want to remove one of the hot spares so ZFS can work on one at a time. However, when I try to detach one of the spares, I get "cannot detach /dev/disk/by-id/scsi-SATA_ST3000DM001-1CH_W1F49M01: no valid replicas". Perhaps I can use the -f option to force the operation, but it's not clear to me exactly what the result of that will be, so I wanted to see if anyone has any input before going forward.

If I can get the system into a stable state where it can remain operational long enough for the backup to complete I plan to take it down for overhaul, but with the current conditions it's stuck in a bit of a recovery loop.

jasongullickson
  • 573
  • 4
  • 11
  • You tagged this `zfs-fuse`. Is this *really* ZFS Fuse? Please provide OS details. – ewwhite Jul 14 '14 at 14:43
  • You bet ewwhite. Debian 6.0.6 – jasongullickson Jul 14 '14 at 14:48
  • How much RAM does this system have? How often did you scrub the array? – Chris S Jul 14 '14 at 14:49
  • Why were you using FUSE and not a real ZFS implementation? Especially given that there's a lot of hardware here. I'm thinking this array is done... – ewwhite Jul 14 '14 at 14:54
  • At the time the system was built there was no native ZFS implementation for Linux. – jasongullickson Jul 14 '14 at 17:26
  • Also ewwhite, just so I'm clear are you saying you think that force removing one of the spares will result in a faulted pool, or did you mean "done" in a more general sense? – jasongullickson Jul 14 '14 at 17:51
  • @jasongullickson I'm thinking faulted pool. I've asked an expert to provide his assessment. [ZFS Google Group](https://groups.google.com/a/zfsonlinux.org/forum/#!forum/zfs-discuss) would probably be a better resource. – ewwhite Jul 14 '14 at 20:44
  • Thanks @ewwhite, also thanks for the lead on the ZFS google group, I'll check that out. I'm also going to setup a test pool to do an experiment to simulate the removal and see what happens. – jasongullickson Jul 15 '14 at 21:10
  • does `zpool status -v` show damaged files? see http://serverfault.com/questions/523390/zfs-endless-resilvering – longneck Jul 17 '14 at 13:10
  • Ah yes thank-you @longneck for the pointer back, I forgot that it was necessary to replace/remove the corrupt files in order for resilver to complete. – jasongullickson Jul 17 '14 at 15:27
  • Also @longneck, once the damaged files are deleted do I need to issue a "zpool clear" or will the resilver/scrub clear the errors once the files are gone? – jasongullickson Jul 20 '14 at 12:02
  • Well there's a native port of ZFS on Linux now, take a look at http://zfsonlinux.org/ . – Marc Stürmer Aug 06 '14 at 19:37

1 Answers1

1

Right now you can detach the UNAVAIL disks, ZFS is not using those anymore anyway.

You've got two failed disks in a RAIDZ-1 setup. It's very likely you are looking at some data loss and should be ready to restore from backup.

As a side note, RAIDZ has proven to be very flaky in my experience with OpenSolaris/Solaris11. I would advise against using it in ANY kind of production workload.

Also, to reinforce what ewwhite said, FUSE is not your best option. I'd take this opportunity to migrate to something more stable (perhaps FreeBSD 10).

Giovanni Tirloni
  • 5,693
  • 3
  • 24
  • 49