0

Most free to use data recovery tools (Recuva, Easus, MiniTools) offers good recovery with known file extensions for picutres, videos and documents (.png, .xlsx etc). But lesser known file extensions, are having some unusual names, with missing extensions or are not recovered properly.

Is there any reason, for it? Does data recovery really depends on file extensions? What I observed is; 3 months ago shift+deleted .xlsx file was easily found by data recovery tool, but file with lesser known extension, which was deleted 5 minutes ago, can't be found. Is there any way, to easily and quickly recover shift+deleted file few minutes ago?

msinfo
  • 103
  • 2
  • 2
    I don't know the specific tools. But you can make a simple experiment: copy a file with a "known good" extension into one with some strange extension. According to your theory it would not be able to recover this "strange" file much earlier than the file with the good extension. I suspect though that it is not the extension (file name) but the actual file format (file content). There are some common formats which are considered relevant enough to invest sufficient time in recovery. And there are less common formats or formats without a clear structure which are not supported. – Steffen Ullrich Jul 25 '21 at 04:25
  • Thanks @SteffenUllrich, for clearing doubt between file extension (name) and file format (content) and its relationship with data recovery. – msinfo Jul 25 '21 at 18:18

1 Answers1

1

A file is basically data on a physical file system and a reference on logical file system. The file name including its extension is part of the logical layer. Once a file is removed from this layer, the physical blocks (now without a reference from the logical layer) are marked as available. In this stage, the contents of the file are still readable until it gets overwritten with another file, but the name and extension are gone.

Therefore, data recovery of a removed file depends more on the data format than the extension. Some formats are, unless the file was fragmented, easier to recognize by clear and strictly defined beginnings and endings. Some examples include:

  • JPEG images (and derived formats) are sequences of segments, each beginning and most ending with a marker, like 0xFFD8 Start Of Image and 0xFFD9 End of Image.
  • A ZIP archive consist of file entries and ends with a central directory referencing these files. It is like a small file system within a single file.
  • TAR archives have a 512 byte header for each file followed by the file rounded up to multiple of 512 bytes and padded (typically with zeroes).
  • From text files, e.g., XML is easy to recognize, as it may start with <?xml and consists of root element and child elements, each having a defined ending.
  • Modern Microsoft Office files a.k.a. Office Open XML files are ZIP packed XML-based documents.

As these files are that recognizable, they are faster to recover. The time depends on how many formats there has to be defined, recognized and analyzed. E.g., if you know you are only after JPEG images you can narrow this process down to JPEG markers.

Esa Jokinen
  • 16,100
  • 5
  • 50
  • 55
  • As we are today in the "SSD age" I would reformulate the first part that data is usually not erased by overwriting but mostly by TRIM releasing the blocks which on most SSD end up that they return zeros for the TRIMed blocks. – Robert Jul 25 '21 at 17:55
  • Thank you, @Esa Jokinen for explanation and examples. It helped to clear relevant doubts. – msinfo Jul 25 '21 at 18:23