This is the scenario.
The environment is Linux (Arch Linux actually).
An uncompressed 1,3TB tar
file has been written on a freshely XFS-formatted 2TB disk as a backup.
Later on, a 586MB UEFI boot image has been written (by mistake) to the very same disk device with this very command: dd if=./bootimage of=/dev/sdd bs=4M
.
What I understand is, besides the idiocy of the action, that the disk hasn't been reformatted nor wiped. "Just" its first 500+MB worth of sectors have been overwritten.
My first attempt has been based on the assumption that the blocks were allocated linearly by XFS and that I knew the exact size of the overwritten portion. The idea was to skip all those blocks and then to try to pipe all subsequent blocks to the cpio
tool: it can do its best to handle a damaged tar
file (mine has been truncated on the head).
FILESIZE=614465536
SECTSIZE=$(( 2 * 1024 * 1024 )) # 2M
SKIPSIZE=$(( $FILESIZE / $SECTSIZE ))
dd if=/dev/sdd ibs=$SECTSIZE obs=$SECTSIZE skip=$SKIPSIZE | cpio -ivd -H ustar
(I switched to 2M block transfer becasue the file size is a mutiple of it but not of 4M). No luck at all for recovery. But now I know that the disk layout used by XFS isn't linear.
Next step has been to try to repair the file system (well a copy of it) with xfs_repair
once the partition table had been fixed with fdisk
.
It found the "xfs signature" and I accessed that single partition with the help of a loop
device. Unluckily xfs_repair
failed with a "read only 0 of 512 bytes".
Moreover it seems tyhere's no way to recover lost files under XFS.
Third attempt has been done with the help tools like foremost
and testdisk
. But my attempts have shown little success so far. They actually have been able to recover some files, mainly multimedia files (GIFs, JPGs, PNGs, WAVs and MP3s). But those are a fraction of the actual content of the backup. It looks like foremost
has a focus on typical Windows files. But they cover about 15% of the 1.3TB of data. There should also be lots of text files, libreoffice files, and also gzip
and bzip2
files. So far 15% is better than 0%.
I have also searched through all documentation I have at hand and also "googled" for similar scenarios (also here on serverfault). The more relevant ones were about sending the disk to data recovery firms. No similar task seems to be documented on the Internet.
What'd be the best strategy in order to maximize the file recover?
The perfect one would aim at recovering the surviving part of that single tar
file by recovering the remaining part of the i-node chain.