6

I have updated Ubuntu from version 13.04 to version 13.10, only to discover that the SATA disks IDs have changed, and my ZFS pool now fails.

On new Ubuntu 13.10 the disk IDs are now ata-* instead of scsi-SATA_*.

This is the pool status after the update:

  pool: nestpool
 state: UNAVAIL
status: One or more devices could not be used because the label is missing 
    or invalid.  There are insufficient replicas for the pool to continue
    functioning.
action: Destroy and re-create the pool from
    a backup source.
   see: http://zfsonlinux.org/msg/ZFS-8000-5E
  scan: none requested
config:

    NAME                                                STATE     READ WRITE CKSUM
    nestpool                                            UNAVAIL      0     0     0  insufficient replicas
      raidz2-0                                          UNAVAIL      0     0     0  insufficient replicas
        scsi-SATA_WDC_WD4000F9YZ-_WD-WCC1F0046946       UNAVAIL      0     0     0
        scsi-SATA_WDC_WD4000F9YZ-_WD-WCC4A0026423       UNAVAIL      0     0     0
        scsi-SATA_WDC_WD4000F9YZ-_WD-WMC1F0011145       UNAVAIL      0     0     0
        scsi-SATA_WDC_WD4000F9YZ-_WD-WMC1F0049294       UNAVAIL      0     0     0
        scsi-SATA_WDC_WD4000F9YZ-_WD-WMC1F0051143       UNAVAIL      0     0     0
        scsi-SATA_WDC_WD4000F9YZ-_WD-WMC1F0051756       UNAVAIL      0     0     0
        scsi-SATA_WDC_WD4000F9YZ-_WD-WMC1F0056625       UNAVAIL      0     0     0
        scsi-SATA_WDC_WD4000F9YZ-_WD-WMC1F0200560       UNAVAIL      0     0     0
    logs
      mirror-1                                          UNAVAIL      0     0     0  insufficient replicas
        scsi-SATA_Samsung_SSD_840S1ATNEAD707062H-part2  UNAVAIL      0     0     0
        scsi-SATA_Samsung_SSD_840S1ATNEAD707066K-part3  UNAVAIL      0     0     0

After a very long research on the Internet I started following procedure:

First exported the pool using: zpool export nestpool

Then tried to import back the pool using: zpool import -m -f -d /dev/disk/by-id nestpool

But the import fails with the message: cannot import 'nestpool': one or more devices is currently unavailable

This is the current output of zpool import

   pool: nestpool
     id: 3947768928242827823
  state: DEGRADED
 status: One or more devices contains corrupted data.
 action: The pool can be imported despite missing or damaged devices.  The
    fault tolerance of the pool may be compromised if imported.
   see: http://zfsonlinux.org/msg/ZFS-8000-4J
 config:

    nestpool                                                DEGRADED
      raidz2-0                                              ONLINE
        ata-WDC_WD4000F9YZ-09N20L0_WD-WCC1F0046946          ONLINE
        ata-WDC_WD4000F9YZ-09N20L0_WD-WCC4A0026423          ONLINE
        ata-WDC_WD4000F9YZ-09N20L0_WD-WMC1F0011145          ONLINE
        ata-WDC_WD4000F9YZ-09N20L0_WD-WMC1F0049294          ONLINE
        ata-WDC_WD4000F9YZ-09N20L0_WD-WMC1F0051143          ONLINE
        ata-WDC_WD4000F9YZ-09N20L0_WD-WMC1F0051756          ONLINE
        ata-WDC_WD4000F9YZ-09N20L0_WD-WMC1F0056625          ONLINE
        ata-WDC_WD4000F9YZ-09N20L0_WD-WMC1F0200560          ONLINE
    cache
      ata-Samsung_SSD_840_PRO_Series_S1ATNEAD707062H-part1
    logs
      mirror-1                                              UNAVAIL  insufficient replicas
        ata-Samsung_SSD_840_PRO_Series_S1ATNEAD707062H      UNAVAIL  corrupted data
        ata-Samsung_SSD_840_PRO_Series_S1ATNEAD707066K      UNAVAIL

Log partitions were not correctly identified, they should be ata-Samsung_SSD_840_PRO_Series_S1ATNEAD707062H-part2 and ata-Samsung_SSD_840_PRO_Series_S1ATNEAD707066K-part3. And the argument -m during import is not helping.

I cannot find a way to tell zpool import to use a different path/id for the log devices. Any help and idea for fixing this problem will be much appreciated. What else can I do to recover this pool?

Manolo
  • 63
  • 1
  • 4
  • Well, this is a mess. Can you add the output of `fdisk -l` ? – ewwhite Jan 05 '14 at 17:12
  • Here it goes, gdisk output (fdisk reports only that it is GPT and not supported) gdisk -l /dev/sda Number Start (sector) End (sector) Size Code Name 1 2048 437618687 208.7 GiB BF01 ZFS L2ARC 2 437618688 500117503 29.8 GiB BF01 ZFS ZIL – Manolo Jan 05 '14 at 20:02
  • Can you just import the pool, remove the log devices and re-add the log devices with the current names? – ewwhite Jan 05 '14 at 20:06
  • [http://pastebin.com/bBec21WN](http://pastebin.com/bBec21WN) (Sorry had to use pastebin) – Manolo Jan 05 '14 at 20:08
  • This is what I tried to do and it is not working. The import fails with the message reported. – Manolo Jan 05 '14 at 20:10
  • Can you just do `zpool import nestpool`? I've never had issues with this. Perhaps the problem is using a mirror of your ZIL devices created from partitions of your OS disk. That's a little rough. – ewwhite Jan 05 '14 at 20:17
  • Yes, sorry I did not mention, but I passed already all these alternatives, with and without all combinations of -m, -f, -d. Yes, I also think that it is a problem with mirror of ZIL partitions. The parameter -m should help according to the man page, but it is not helping. – Manolo Jan 05 '14 at 20:25
  • What zpool version is this? – ewwhite Jan 05 '14 at 21:27
  • I do not see a way to get the version now that the pool is only exported. Any command for this? But it should be version 5000, latest from zfsonlinux. – Manolo Jan 05 '14 at 22:04
  • zdb reports version 5000. – Manolo Jan 05 '14 at 22:25

1 Answers1

4

Please try zpool list. This will show if the pool is even available to the system.

Try to import. Maybe a zpool import -f nestpool

Perhaps try to remove the unavailable log devices via:

zpool remove nestpool mirror-1

From now on, use whole devices for L2ARC and ZIL...

Edit:

Your easiest fix is to temporarily create the symbolic links you need in /dev/disk/by-id in order to import the pool. An example from a RHEL system running the current ZFS.

[root@Davalan /dev/disk/by-id]# ll
total 0
lrwxrwxrwx 1 root root  9 Oct 27 05:29 ata-STEC_M8IOPS-50_STM000136649 -> ../../sdc
lrwxrwxrwx 1 root root 10 Oct 27 05:29 ata-STEC_M8IOPS-50_STM000136649-part1 -> ../../sdc1
lrwxrwxrwx 1 root root 10 Oct 27 05:29 ata-STEC_M8IOPS-50_STM000136649-part9 -> ../../sdc9
lrwxrwxrwx 1 root root  9 Oct 27 05:29 scsi-35000c5003af99fa7 -> ../../sdd
lrwxrwxrwx 1 root root 10 Oct 27 05:29 scsi-35000c5003af99fa7-part1 -> ../../sdd1
lrwxrwxrwx 1 root root 10 Oct 27 05:29 scsi-35000c5003af99fa7-part9 -> ../../sdd9
lrwxrwxrwx 1 root root  9 Oct 27 05:29 scsi-35000cca0153ec2d0 -> ../../sdb
lrwxrwxrwx 1 root root 10 Oct 27 05:29 scsi-35000cca0153ec2d0-part1 -> ../../sdb1
lrwxrwxrwx 1 root root 10 Oct 27 05:29 scsi-35000cca0153ec2d0-part9 -> ../../sdb9
lrwxrwxrwx 1 root root  9 Oct 27 05:29 scsi-35000cca01540e298 -> ../../sdf
lrwxrwxrwx 1 root root 10 Oct 27 05:29 scsi-35000cca01540e298-part1 -> ../../sdf1
lrwxrwxrwx 1 root root 10 Oct 27 05:29 scsi-35000cca01540e298-part9 -> ../../sdf9
lrwxrwxrwx 1 root root  9 Oct 27 05:29 scsi-35000cca01540e340 -> ../../sde
lrwxrwxrwx 1 root root 10 Oct 27 05:29 scsi-35000cca01540e340-part1 -> ../../sde1
lrwxrwxrwx 1 root root 10 Oct 27 05:29 scsi-35000cca01540e340-part9 -> ../../sde9
lrwxrwxrwx 1 root root  9 Oct 27 05:29 scsi-SATA_STEC_M8IOPS-50_STM000136649 -> ../../sdc
lrwxrwxrwx 1 root root 10 Oct 27 05:29 scsi-SATA_STEC_M8IOPS-50_STM000136649-part1 -> ../../sdc1
lrwxrwxrwx 1 root root 10 Oct 27 05:29 scsi-SATA_STEC_M8IOPS-50_STM000136649-part9 -> ../../sdc9
ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • Once I exported the pool, it was automatically removed from the system, so it is no longer available to the system, actually this is why I need help. I think this is the normal behaviour. I am sorry again if it is not, in this case I also forgot to mention it. And so all these commands do not apply. – Manolo Jan 05 '14 at 21:57
  • I wish there was a way to apply similar commands to exported pools. – Manolo Jan 05 '14 at 21:58
  • smart idea the one with the links, gonna try now... – Manolo Jan 05 '14 at 22:09
  • Cooooooool, it worked. It first changed the devices again to something else, like this: `mirror-1 UNAVAIL insufficient replicas wwn-0x50025385503e8531 UNAVAIL corrupted data wwn-0x50025385503e8535 UNAVAIL` but then I applied the linking trick to these new names... and it finally worked! – Manolo Jan 05 '14 at 22:18
  • Now, remove the ZIL devices and re-add them with the system-generated device names. – ewwhite Jan 05 '14 at 22:19
  • I am now reviewing everything, but it seems not necessary, with the linking trick, zpool was able to finally find the right device during import process, so it was imported with the correct devices. In my specific case the linking trick helped zpool find and set the right devices. – Manolo Jan 05 '14 at 22:28
  • And I will for sure no longer use partitions for anything related to ZFS! Lesson learned! – Manolo Jan 05 '14 at 22:29