Bad sectors will eventually occur, but how should I deal with them? If a bad sector occurs, does that mean that the data in that sector is irrecoverably lost, and I should restore it from backup? Is there any way to automate finding out which file belonged to that sector and at which offset, and to automate that recovery? Is there anything I can do on the filesystem level to make my life easier? (ECC?)
-
4You have backups, and you replace the disks when failures occur. You can setup RAID help help deal with media failures. You can use filesystems with checksums to detect failures (ZFS, BTRFS) – Zoredache May 09 '13 at 09:20
2 Answers
You do not deal with bad sectors. Your hardware, server configuration, and internal procedures protect you from their effects.
Every modern hard drive anticipates a certain amount of bad sectors, and internally remaps them. This process is completely transparent to the user/OS, until such time as the remapping space is all used up (at which point you start seeing bad sectors).
Long before you see bad sectors your drive will start crying - SMART or equivalent technology causes the drive to report faults to the operating system (which you are of course monitoring for, right?).If you love your data (and who doesn't) then you don't just trust it to one hard drive.
All your important data is on RAID volumes (hardware or software - makes no difference for the purposes of this discussion).
RAID gives you two or more redundant hard drives, so that when one disk fails you have the opportunity to replace it without losing any data.Because you know that RAID Is Not A Backup, you also make regular backups (and periodically verify that you can restore them successfully), so that even if you lose enough drives that your RAID array is trashed you can still get your data back.
As with all good strategies, this is Defense In Depth:
The hard drives do their best to safeguard your data by handling errors/bad sectors gracefully.
Should the hard drive fail, RAID keeps your data safe until you can fix the hardware problem.
If the RAID fails to protect you your backups are a final chance to save your data.
Ideally you use all of these techniques all of the time (at least for important data), but you always have at least one layer of the onion (even laptop hard drives are S.M.A.R.T. these days).
-
Alright, I didn't know that a remapped sector is not the same as a bad sector. I thought bad sectors get remapped. So when the disk is degraded so much that a sector cannot be remapped then it becomes a bad sector? And if it *was* remapped then no data was lost at that point in time? – Hongli Lai May 09 '13 at 21:27
-
1A remapped sector ***IS*** a bad sector - it's just one the hardware has dealt with for you transparently (most of the time with no data loss). When you start *seeing* bad sectors on a modern drive (with a tool like `badblocks`, or in your logs) it means all the remapping area has been used -- at that point you should consider the disk dead and replace it. – voretaq7 May 10 '13 at 15:20
-
There is a purpose, because i like to use my old clunker drives in raidz2 configuration, and when one of the drives goes faulted to reallocated sector count being too high, i like to ensure those sectors are marked bad, then resilvere it again. Rinse and repeat until it turns itself to shrapnel. One caveat, if you use too much of your drive up with unusable sectors, it wont be large enough for the stripe. – Brian Thomas Jul 11 '19 at 23:22
Every time a hard drive writes a sector, it also updates a checksum (stored immediately after the sector data). When a sector is read from your hard drive, it's expected that the sector checksum will match the sector data, if that is not a case, something went wrong during the write operation, that's called a bad sector.
There are two common reasons for bad sectors:
- Power failure during write.
- Hard drive is malfunctioning.
I have published a free program that allows you to test your disk for bad sectors, and see whether you should replace your hard drive, or simply wipe the bad sectors of a healthy drive, you're welcome to download it here.
As for your second question, I usually store an MD5 checksum of each of my important files in an NTFS alternate data stream, I have written a nice program that helps me hash and verify my files, and it had helped me on more than one occasion, check it out here.
p.s. RAID will not save you from bad sectors during power failure (unless you have battery backup), I know this from first hand experience. moreover, you may be required to wipe out the bad sectors to allow the array to be rebuilt successfully.
- 171
- 2
-
I would have voted this up for the information about the checksum, if you would not have included links to your programs that I am not sure are working or correct or whatever. – bomben Feb 10 '19 at 18:23
-
-