0

A client sent us an external hard drive where at least half the files are corrupt. They are broad mix of filetypes (images, documents, etc) and there is no discernible pattern into which are corrupt. They appear as their original size, however, when I open them in a hexeditor they are filled with nothing but nulls. The data has been completely replaced with 00s.

What could cause this to happen? The files were likely copied onto the drive from another machine. Could this result from problems during a transfer or is it more likely the files are corrupt at the origin?

  • Have you checked the files at the origin? Are they corrupt? If not then the problem is probably during the transfer or at the destination. It seems like this would be fairly straightforward to narrow down. – joeqwerty Jun 24 '21 at 16:49
  • @joeqwerty unfortunately no, we cannot check them ourselves. We only have custody of the hard drive and have no knowledge of how the client placed the files on there. I'm trying to get a sense of how this happened before reaching out to them. – user354104 Jun 24 '21 at 17:27

1 Answers1

1

Seems like the metadata were correct, so files appear in the directory trees, have names, access modes etc, but the data itself is corrupt (was not reached a media).

How this is possible depends on the file system, mount options, caching modes for the drive and so on.

Let's take ext4 for example, where it is relatively easy to make this to occur. Default mounting options use journal for metadata only, so the file system generally guarantees that on-disk structures will be correct in any case, and everything will look either as if nothing was made to the drive or the operation is applied completely. Just as in the ACID database. But the data isn't journalled by the default, so it is possible the system completed system call, reported a success to the application, created all necessary structures (in the journal only for now), while data is residing in the cache... and now power is cut. When you power the system again and mount this volume, the file system driver will replay the journal and the files will appear, but the data will be garbage left from previous block usage. That garbage could be zeros indeed. In the end, cutting the power during write is likely to produce zero-filled files. I'd expect the same result when unplugging the drive early (like pulling out the USB cable).

This unplugging scenario is quite likely taking into account you're talking about external drive. Certainly this is possible with other file systems too.

Nikita Kipriyanov
  • 8,033
  • 1
  • 21
  • 39