On Debian 8.7 I had a zfs pool. (obviously using ZFS on Linux, not Oracle or Solaris zfs)
It was needed to extend ZFS pool from mirror on 2 disks to raidz on 4 disks. I did backup (one copy of data - it was my first mistake)
I thought that zpool destroy
would not work until I remove all datasets (volumes), so I did zfs destroy
(this was my second mistake).
After that I issued 'zpool destroy', repartitioned all 4 disks and found out that backup is damaged.
So I started my recovery adventure:
First good thing about ZFS is that it's able to import destroyed pools.
After zpool destroy yourPoolName
you can invoke zpool import -D
to see list of destroyed pools.
Your can then imoprt it using zpool import -D yourPoolName
or if you have destroyed several pools with same name then you can import it by id, which is shown by zpool import -D
.
zpool import -D
requires partitions in their original place. It has to be exact up to sector.
I have used fdisk
to create partitions with exact start and end sector number.
I have used cfdisk
to set partition type (because it's more user friendly :) )
And then you should invoke partprobe
in order to be sure that OS knows about changed partitions.
zpool import -D
worked like a charm and I had my pool online in perfect health again!..
But with full consequences of zfs destroy
- all the data was missing.
ZFS stores changes to files and file system in transactions, which are saved to disk in transaction groups (TXG) My further research has shown that I have to rollback last transaction groups.
There are 2 ways to rollback ZFS transaction groups:
- using special
zpool import
with-T
option - using
zfs_revert-0.1.py
First of all you need to find last good TXG.
zpool history -il
helped me.
According to first way you should invoke something like: zpool import -o readonly=on -D -f -T <LAST-GOOD-TXG> poolName
(with additional parameters, if you like: -F
, -m
, -R
)
Unfortunately this command worked only with actual TXG.
Going back even to pre-last TXG didn't worked and showed error messages like "device is unavailable".
It looks like this feature is working (or has worked) on Solaris only.
Pity.
I have analyzed the code of the zfs_revert-0.1.py
, it looks clear and promising.
I have used this tool but it looks like I needed to delete too much TXGs.
After that zpool import -D
was unable to detect the pool anymore.
Currently I have recovered one of the older backups, I have dd
dumps of 2 disks, which were mirrored but after zfs destroy
and zpool destroy
.
It looks like we will just live with data from older backup and stop further recovery process.
Nevertheless I will be glad to try to recover data if somebody will suggest what to do in such situation.
Further recovery would be done in VMWare Workstation, so I will need to find a way how to import zpool in a VM (disk IDs probably will change)
Question What can I try next?
Lessons learned:
- Always keep at least 2 copies of data. When you are manipulating main storage you need a backup of backup.
zfs destroy
is not needed and very dangerous if you are going to dozpool destroy
anyway.
Comments: It's obvious that during recovery you should completely stop writes to disks where damaged data was stored.
Useful commands: zpool import -D zpool import -o readonly=on -D -f originalPoolName newPoolName zpool status tank
zpool online dozer c2t11d0
zpool scrub tank
zpool history -il
zpool export tank
zpool import dozer zeepool
Links:
- Tools
- Information about damaged ZFS
- ZFS on FreeBSD: recovery from data corruption
- Oracle Solaris ZFS Administration Guide
- Importing ZFS Storage Pools
- ZFS Data Recovery
- Need help to recover data from damaged ZFS pool
- Chapter 9. ZFS Troubleshooting and Data Recovery
- Bruning Questions: ZFS Forensics - Recovering Files From a Destroyed Zpool
- "Back in time - or: zpool import -T"
- All data gone. Accidential zfs destroy
- ZFS dataset restoring
- Need Help Invalidating Uberblock
- ZFS Import