8

After 3 years in 24x7 service a 1TB Seagate Barracuda ES.2 enterprise drive is showing signs of failure. S.M.A.R.T. reallocated sector count is high.

Wikipedia article suggests that the drive can still be used for less sensitive purposes like scratch storage outside of an array if remapped sectors are left unused.

A workaround which will preserve drive speed at the expense of capacity 
is to create a disk partition over the region which contains remaps and 
instruct the operating system to not use that partition.

In order to create such a partition it is necessary to fetch the list of remapped sectors. However there are no badblocks visible to the operating system. I.e. badblocks returns an empty list.

Is there a way to recover the list of reallocated sectors?

Edit: This drive is from an array. We get a few of them failing every year and just throwing them away seems to be a waste. I am thinking of giving a second chance to the better parts of the platters.

Here is how the S.M.A.R.T. report looks now.

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda ES.2
Device Model:     ST31000340NS
Serial Number:    **********
Firmware Version: SN05
...
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   056   054   044    Pre-fail  Always       -       164293299
  3 Spin_Up_Time            0x0003   099   099   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       14
  5 Reallocated_Sector_Ct   0x0033   005   005   036    Pre-fail  Always   FAILING_NOW 1955
  7 Seek_Error_Rate         0x000f   076   060   030    Pre-fail  Always       -       8677183434
  9 Power_On_Hours          0x0032   072   072   000    Old_age   Always       -       24893
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   037   020    Old_age   Always       -       14
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   097   097   000    Old_age   Always       -       3
190 Airflow_Temperature_Cel 0x0022   050   043   045    Old_age   Always   In_the_past 50 (0 6 50 32)
194 Temperature_Celsius     0x0022   050   057   000    Old_age   Always       -       50 (0 18 0 0)
195 Hardware_ECC_Recovered  0x001a   021   010   000    Old_age   Always       -       164293299
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       21
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       21
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
Dmitri Chubarov
  • 2,296
  • 1
  • 15
  • 28
  • 1
    It's not clear why you think you need to do something special. The excerpt from the Wikipedia article is about how to preserve drive speed. Is that what you're trying to do? Is your issue with these drives performance? – David Schwartz May 10 '12 at 08:26
  • Thank you. I did not think about performance a lot. My intention is to keep using the drive. That is to avoid parts of the drive that are more likely to fail in the future and to avoid the performance penalty that should occur. Interesting, perhaps it is possible to locate remapped areas by measuring time of reading or writing small sequences of sectors. – Dmitri Chubarov May 10 '12 at 09:00
  • Not likely, because modern drives distribute the spares so the performance affect isn't usually noticeable. There's nothing special you need to do. If you want to keep using the drives, run a few full read/write passes over them to shake out any about-to-fail sectors and hope for the best. – David Schwartz May 10 '12 at 09:04
  • 1
    The best way to use a drive with a high sector relocation count is as a paperweight. Anything else is false economy. – John Gardeniers Aug 26 '12 at 02:26

7 Answers7

17

You don't.

You go buy another disk to replace it unless you just really like losing data.

SpacemanSpiff
  • 8,733
  • 1
  • 23
  • 35
  • and I only say it like that because 5 bad sectors with only 72 power on hours? That doesn't sound like a good trend. – SpacemanSpiff May 10 '12 at 05:12
  • Thank you for your reply. The disk has 24893 power on hours. And it is a disk from an array. We have a few of them and throwing them away seems like a waste. I'll update the question to reflect this motivation. – Dmitri Chubarov May 10 '12 at 05:15
  • Yes, I see your point, but is this your data or someone else's? Let them justify the cost to risk ratios. – SpacemanSpiff May 10 '12 at 18:48
  • 3
    Generally when I encounter bad sectors, the plan is like this: if it's got only a few that's no biggie, could be just manufacturing flaws, check again after a while. If it's got about 4-10, drive is risky and needs to be monitored. If it's got over 10, the drive should be removed and marked/destroyed/cleaned+returned before it ruins everyone's day. – Sašo May 11 '12 at 19:19
  • 3
    After playing more with the disk I have seen the firmware simply block the disk when the number of reallocated sectors is high. IMHO that's dirty playing from the firmware, but there is nothing I can do to change its behavior. Detailed report posted below. – Dmitri Chubarov May 14 '12 at 14:09
12

I'd like to thank you for the advice and share some of the details that I've got from experiments.

In short, there is no easy way to get the list of reallocated sectors and even statistical methods of mapping the disk are heavily encumbered by the need to play against the logic of the firmware.

To test the drive I ran badblocks -wv with the default blocksize and monitored the reallocated sector count in the process. I made several observations.

  1. I observed that there was a sharp rise in the number of reallocated sectors when writing to the beginning of the disk. Then from the first 10G to 700G there was no change. This can be explained by the fact that certain RAID houskeeping data was stored at the beginning of the disk, therefore the wear in the small addresses area was higher than in the rest of the disk.

  2. Then after a single error the disk turned itself into a blocked mode. That is every ATA command, even IDENTIFY DRIVE returned ABRT. Even though the value of reallocated sectors was still positive. To explain this behaviour as David Schwartz suggested, I assumed that reserved sectors are somehow distributed over the address space of the drive. This means that the drive might have reserved sectors, yet a part of it may run out of sectors to remap. In this situation the firmware just blocks the drive.

  3. The drive returns out of the blocked mode only after powercycling the drive. When the old drives let the software keep track of bad blocks and avoid using them, modern drives do not give this opportunity. When the firmware thinks it cannot cope with the errors, it makes the drive unusable.

  4. By running the value of reallocated sectors down to 02 I conclude that there are 2048 reserved sectors on this drive.

  5. So-called low level formatting, or writing zeros to every accessible sector of the drive to reallocate the sectors from less reliable parts of the disk would not work because when the drive runs out of reserved sectors it changes the way it handles errors in a way that makes it much less convenient to use than a traditional drive that does not do any predictive failure analysis and simply reports an error.

Dmitri Chubarov
  • 2,296
  • 1
  • 15
  • 28
3

If you have business data that is worth less than the cost of the drive then use them for that, if not then throw them away or give them to people from the department who understand the risks. Contact the manufacturer and see if they offer recycling.

user9517
  • 114,104
  • 20
  • 206
  • 289
3

If the drive is still under warranty, you can return it to the manufacturer via their RMA process for a free replacement, after sanitizing it first. (Secure Erase will wipe the entire drive, including reallocated or otherwise inaccessible sectors.) (I'm quite surprised nobody suggested this.) Otherwise, you do what @SpacemanSpiff said and buy a new drive.

Michael Hampton
  • 237,123
  • 42
  • 477
  • 940
1

actually an enhanced secure erase is better as that covers the reserved blocks as well.

However: If there are really that many bad sectors, the disk is a paperweight. Ditto if it won't reallocate them or declare them ok (Pending sectors occur when there's a read issue. Most of them are "soft" errors, usually caused by external vibration.)

Stoat
  • 11
  • 1
1

I've had many drives like that, llf with manufacture's tools after changing the start position if that's where most of the bad sectors are and take 5-10% off the drive capacity. If it's a decent controller and software it'll use the unallocated as spares. I ran a WD 1800 cut down to 160 GB for 5 years without trouble until the controller was torched by a bad power supply. I am presently using a Samsung similarly for TV caps, removed 100 GB of a 2 TB, more errors in a transport stream than a drive would hope of introducing so it's not an issue for a while.

Hitachi, Samsung and WD llf tools seem to do a good job of remapping, don't know about Seagate yet as they've either went into disuse or suffered immediate catastrophic failure.

*Doing these things are a lot easier now with the ultimate boot disk.

Ty2010
  • 11
  • 2
  • I found a free utility which claim to do partitioning automatically to keep bad blocks in unused space: http://www.dposoft.net/rbd.html and I'll give it a try on my MAXTOR 500 GB which has one visible bad block at the beginning of the drive. Hopefully, it will work for unimportant stuff like movies etc. – JustAMartin Sep 27 '15 at 14:08
-1

If you really want to risk your data on this disk (I wouldn't) then use dd to write the disk entirely to zeros.

dd if=/dev/zero of=/dev/sdX

This will cause the drive to reallocate the pending sectors and the whole surface of the disk will be usable. For a while ;-)