0

With a drying drive (that has more and more bad sectors), on an ext4-journaled partition, a cp operation will sometimes complete without error, but the written data will either be wrong, or just won't be readable (Invalid Argument errors after reading part of it), just after the file was created.

Is there a file-system that would prevent such events (ZFS maybe)?

Or are there any command-line applications I could use instead of cp, to check that the copied data is fine, before I remove the source?

I could just md5 both the source file, and the newly created file, but that sounds like a hack to me.

Thanks.

Update: I guess I didn't explain enough about why I'm asking this, and thus everyone assumed something that isn't.

I do not want to continue using this drive. This drive has been disconnected from the moment I noticed the problem. What I want is prevent what happened here from happening again with other drives.

I have a script that uses cp to regularly copy some files from drive1 to drive2, then from drive2 to drive3 soon after. The problem I faced was that cp didn't complain when it copied from drive1 to drive2, even though the data on drive2 was unreadable when it came time to copy it to drive3. At that point, my copy from drive1 was already deleted (because I needed free space, and my script didn't report any error, thus why I assumed the data on drive2 was OK). So I lost files.

So my question is: what is the best way to stop that from happening again in the future? Should I just use a file-system with checksums, or use a copy tool that does checksums itself?

Guillaume Boudreau
  • 634
  • 1
  • 5
  • 13
  • 10
    Why on earth are you still using the drive? Replace it. – Michael Hampton Jan 05 '13 at 20:51
  • 2
    Hard drives do this sort of check internally and report that they are failing through SMART. Take the hint and replace the drive. – JamesRyan Jan 05 '13 at 21:27
  • Not sure why everyone assumed I wanted to keep using that drive... My questions were about how to prevent silent corruption (in the future), not about how to fix that already dead drive. Anyway, I added more details to my question. – Guillaume Boudreau Jan 09 '13 at 02:03
  • This question has been asked before; see [Is bit rot on hard drives a real problem? What can be done about it?](http://serverfault.com/q/77710/126632) – Michael Hampton Jan 09 '13 at 02:24

3 Answers3

8

Remount it as READ-ONLY and copy your data of NOW.

That is the only sane thing to do with a bad drive.

Any attempt to keep using it is madness.

Tonny
  • 6,252
  • 1
  • 17
  • 31
  • If most of the drive still works fine, what's wrong with using the part that is still working? Why not just read the data back shortly after writing it, to make sure the data is still correct, and if it isn't, mark the sector bad? – enigmaticPhysicist Oct 25 '16 at 00:08
3

There isn't a file system which will prevent a failing hard drive from failing, at best they would let you know that data is wrong via checksums, ZFS and Btrfs have checksum support, I believe EXT4 is working on adding checksums.

The only thing can do is to get a new drive and copy the data from the old drive before it fails completely and you lose all of you data.

Regarding cp you could use rsync copy the files and after is has copied, run it again and if the data has not changed then nothing will be copied and you will know the copy was fine, if it wasn't it would only copy the parts of the file which were different.

I have no idea why you would try to keep using the drive, you would save a lot of time and messing around by just replacing it.

Epaphus
  • 1,011
  • 6
  • 8
3

First: stop mounting your drive writable - this is very dangerous because a confused driver can produce a lot of damage to your data.

Second: Try one of these two tools: myrescue and ddrescue. You will need another healthy device for that as a target. The tools try to copy as much as possible, skipping bad blocks and trying them later. This might take a lot of time depending of the damage of your device. But if you're happy you will get most of your data back.

Daniel Alder
  • 533
  • 1
  • 8
  • 19