A file is basically data on a physical file system and a reference on logical file system. The file name including its extension is part of the logical layer. Once a file is removed from this layer, the physical blocks (now without a reference from the logical layer) are marked as available. In this stage, the contents of the file are still readable until it gets overwritten with another file, but the name and extension are gone.
Therefore, data recovery of a removed file depends more on the data format than the extension. Some formats are, unless the file was fragmented, easier to recognize by clear and strictly defined beginnings and endings. Some examples include:
- JPEG images (and derived formats) are sequences of segments, each beginning and most ending with a marker, like
0xFFD8
Start Of Image and 0xFFD9
End of Image.
- A ZIP archive consist of file entries and ends with a central directory referencing these files. It is like a small file system within a single file.
- TAR archives have a 512 byte header for each file followed by the file rounded up to multiple of 512 bytes and padded (typically with zeroes).
- From text files, e.g., XML is easy to recognize, as it may start with
<?xml
and consists of root element and child elements, each having a defined ending.
- Modern Microsoft Office files a.k.a. Office Open XML files are ZIP packed XML-based documents.
As these files are that recognizable, they are faster to recover. The time depends on how many formats there has to be defined, recognized and analyzed. E.g., if you know you are only after JPEG images you can narrow this process down to JPEG markers.