3

This one has really got me scratching my head.

First up, the data is safely backed up although I've lost hours of migration work. But the result of what's happened has got me concerned.

I was migrating a backup array on an older server with 2.5" SATA backplane. The 1TB discs were too small, so I had begun switching to 2TB discs - using the Seagate FireCuda 2.5" 2TB discs which are both SSHD and SMR - my first look at SMR discs.

Partway through the process, the ZFS pool has vanished without a trace.

Storage on this array is backups. The old setup was Linux MD RAID-1 and I was going to migrate to a ZFS mirror while changing discs.

The first disc to replace was /dev/sdc.

I'm using whole discs for ZFS devices (ie /dev/sdc not /dev/sdc1).

The migration process was to (1) back up to three external discs, (2) break the RAID mirror, (3) remove one 1TB disc, (4) install the first SSHD, (5) create the ZFS pool, (6) copy data, (7) verify it, (8) stop the MD array, (9) replace the second disc, then (10) mirror the pool.

But I never got past 7. On rebooting the server, the (then still single disc) pool disappeared and I can find no trace of it, despite writing 800GB of data to it.

zdb -l /dev/sdc returns:

--------------------------------------------
LABEL 0
--------------------------------------------
failed to unpack label 0
--------------------------------------------
LABEL 1
--------------------------------------------
failed to unpack label 1
--------------------------------------------
LABEL 2
--------------------------------------------
failed to unpack label 2
--------------------------------------------
LABEL 3
--------------------------------------------
failed to unpack label 3

I tried a binary dump of the first 1MB of /dev/sdc. This is what I got:

# dd if=/dev/sdc bs=1048576 count=1 | xxd
0000000: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0000010: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0000020: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0000030: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0000040: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0000050: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0000060: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0000070: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0000080: 0000 0000 0000 0000 0000 0000 0000 0000  ................
(...) 
00fff60: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00fff70: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00fff80: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00fff90: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00fffa0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00fffb0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00fffc0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00fffd0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00fffe0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00ffff0: 0000 0000 0000 0000 0000 0000 0000 0000  ................

And it's not until I get past 1GB of zeroes before I see any kind of data:

# dd  if=/dev/sdc bs=1048576 count=1 skip=1032
(...)
0074fb0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0074fc0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0074fd0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0074fe0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0074ff0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0075000: 7f4a d799 69ad bcac dd98 5393 afeb 2745  .J..i.....S...'E
0075010: 2d23 af2b fdd9 2a6d c950 dd8b 0c8f 268d  -#.+..*m.P....&.
0075020: 146b 2174 5ddd a757 e49e 2dfa 9e06 d0fc  .k!t]..W..-.....
0075030: dfda 1ee7 ac17 cb41 8246 a7de ff39 d362  .......A.F...9.b
0075040: 67a0 c1e1 7f9b 6a4a 6e45 e2e3 1726 93e2  g.....jJnE...&..
0075050: 8310 6f20 5644 2a2c 0609 1927 9c22 d676  ..o VD*,...'.".v
0075060: 5950 cae7 f14c 938b 39b9 041e 960e 871b  YP...L..9.......
0075070: 7dc6 54eb 5ee4 8cc9 836f adde 4aba dc3b  }.T.^....o..J..;
0075080: 49c7 db23 5d0f 557d 8f63 3e43 9c5e 59c4  I..#].U}.c>C.^Y.
(...)

/dev/sdc shows no SMART errors and it passes tests just fine. I have not attempted to write any data to it since this weirdness started. I'm definitely looking at the correct disc as it reports its capacity as 2TB and it's the only 2TB disc installed.

Has anyone seen anything like this before? Am I using the right tools to work out the first 1GB-and-a-bit of data has somehow been completely zeroed out? Could this be explained by a faulty flash cache on the SSHD disc or something weird with SMR write zones?

That point where the data starts seems to be an interesting number (offset from dd being 1048576x1032 then 0x75000 which looks to be a round number) but I don't know how to make sense of that.

As a complication, while I started this work in the same room as this server, I'm now 4,000km away for two weeks, but hopefully I can get someone else to swap discs if need be. The second FireCuda disc is yet to be unwrapped.

edits: corrected terminology, cleaned up typos

Andrew W
  • 51
  • 5

0 Answers0