-1

This is the scenario.

The environment is Linux (Arch Linux actually).

An uncompressed 1,3TB tar file has been written on a freshely XFS-formatted 2TB disk as a backup.

Later on, a 586MB UEFI boot image has been written (by mistake) to the very same disk device with this very command: dd if=./bootimage of=/dev/sdd bs=4M.

What I understand is, besides the idiocy of the action, that the disk hasn't been reformatted nor wiped. "Just" its first 500+MB worth of sectors have been overwritten.

My first attempt has been based on the assumption that the blocks were allocated linearly by XFS and that I knew the exact size of the overwritten portion. The idea was to skip all those blocks and then to try to pipe all subsequent blocks to the cpio tool: it can do its best to handle a damaged tar file (mine has been truncated on the head).

FILESIZE=614465536
SECTSIZE=$(( 2 * 1024 * 1024 )) # 2M
SKIPSIZE=$(( $FILESIZE / $SECTSIZE ))

dd if=/dev/sdd ibs=$SECTSIZE obs=$SECTSIZE skip=$SKIPSIZE | cpio -ivd -H ustar

(I switched to 2M block transfer becasue the file size is a mutiple of it but not of 4M). No luck at all for recovery. But now I know that the disk layout used by XFS isn't linear.

Next step has been to try to repair the file system (well a copy of it) with xfs_repair once the partition table had been fixed with fdisk. It found the "xfs signature" and I accessed that single partition with the help of a loop device. Unluckily xfs_repair failed with a "read only 0 of 512 bytes". Moreover it seems tyhere's no way to recover lost files under XFS.

Third attempt has been done with the help tools like foremost and testdisk. But my attempts have shown little success so far. They actually have been able to recover some files, mainly multimedia files (GIFs, JPGs, PNGs, WAVs and MP3s). But those are a fraction of the actual content of the backup. It looks like foremost has a focus on typical Windows files. But they cover about 15% of the 1.3TB of data. There should also be lots of text files, libreoffice files, and also gzip and bzip2 files. So far 15% is better than 0%.

I have also searched through all documentation I have at hand and also "googled" for similar scenarios (also here on serverfault). The more relevant ones were about sending the disk to data recovery firms. No similar task seems to be documented on the Internet.

What'd be the best strategy in order to maximize the file recover?

The perfect one would aim at recovering the surviving part of that single tar file by recovering the remaining part of the i-node chain.

EnzoR
  • 282
  • 3
  • 10
  • How important is the data? If you really need it, send it to a professional data recovery firm. While you wait for it to come back, implement a backup solution. If you've struck out with all the existing off the shelf things like photorec, then there's not likely much more the Internet can do for you. – Michael Hampton Dec 02 '18 at 17:02
  • @MichaelHampton Thanks for the hint. As the medium is perfectly working, I'd say there needs to be a software-only solution. Which is what I'm trying to pursue. The backup solution was already in place since years. The procedure hasn't been followed carefully as the device hasn't been disconnected. – EnzoR Dec 02 '18 at 17:06

1 Answers1

1

Many things can be recovered if time and understanding of the involved details is available.

In this case I don't see an off-the shelf solution, and one real problem is that most likely the interesting parts of the filesystem and tar file signatures have been overwritten.

If you really strive to get it done on your own these are the steps I'd try to do:

  • Make a 1:1 backup of the initial broken filesystem.
  • Do only perform write actions on copies of the broken filesystem, as it is highly likely that more than one attempt will be needed to be successful. Every write involves the risk that more data will be destroyed.
  • Check how XFS writes the data to the disk. Most likely this will be done sequentially, but be sure to know if that assumption is true, and what type of data endianness/layout is used.
  • Read and understand the tar fileformat, try to identify tar file start/end signatures and repeating patterns (e.g checksums, ...) to gather information for search chunks. Dig down into the tar sourcecode to understand how the data is written to the filesystem.
  • Try to recover the tar file, and try to fix it so you can extract your files.

All the mentioned tools like photorec, testdisk, foremost, ... will provide you only with limited success on recovered files, and even if the amount recovered is quite well one often ends up with problems like: tons of false positives, missing filenames, no folder structure. Thinking of 1.3 Tb of data all of that will be important to judge that process as successful.

With the information you provided it looks like "only" 500 Mb of data are really currupted, and all of the rest should be in a "good" condition and therefore it should be possible to get a good result, but it depends on how xfs & tar handled the data. As tar is coming from the tape area, the data layout should be very much straight forward. Still that process will not be easy at all and will involve raw data handling to a certain point.

hargut
  • 3,848
  • 6
  • 10
  • I think we are on the same direction. My "only" problem is the "how to". I am not a junior programmer, but am not an OS programmer as well. "tar signatures" don't really exist as the file format is more than just straightforward.: each soft-sector starts with the file path and metadata and a lot of zeros.Thus my wishful thinking. As there seems to be no tool at all to inspect a (n XFS) file system, other than `od`. – EnzoR Dec 02 '18 at 20:50
  • 1
    Start with trying to find the first valid tar information. You know the starting block of the FS, from the partition table. The exact file size of the dd'd file should be available, as well as the content. – hargut Dec 03 '18 at 06:31
  • Your hint looks much like my current approach. I only hope those blocks were allocated linearly... The documentation about XFS isn't that easy to find. – EnzoR Dec 03 '18 at 06:36
  • 1
    E.g. take these informations to copy a small portion of the disk with dd & its options, bs, skip, count. Analyze that portion to find the last block of the written data. grep and hexdump should be helpful. Do similar with the end of tar file. (As long as XFS aligns the data sequentially) then you know the area where your data is located. – hargut Dec 03 '18 at 06:39
  • 1
    If you got that, I'd try to read up what blocks are reserved in XFS, where they are located, .. And then how tar handles split archives. All of this will be a lot of work, and hours drained with no guarantee that it works out. – hargut Dec 03 '18 at 06:41
  • Yes, that's more or less the path I am looking for. Documentation is the problem. Thanks a lot for the hints. – EnzoR Dec 03 '18 at 07:29