How to determine how dead a HDD is from SMARTCTL report

11

3

I had a random report in an open Terminal today saying that "My hard drive is about to fail!"

I did some tests and I'm not sure how bad this is. The computer is acting fine but I went ahead and pushed all my work to github. I don't want to continue working on this computer if there is a chance it'll just crash and I'll lose everything.

The drive is an HDD - Western Digital Caviar SE Serial ATA

What should a normal smartctl look like for a stable drive?

SMART Attributes:

enter image description here

Test Fails:

enter image description here

new Objekt

Posted 2017-01-26T10:58:34.770

Reputation: 113

Are you really talking about a SSD? Especially the Spin_Up_Time surprises me in the context of SSDs. – mpy – 2017-01-31T19:09:44.753

Yeah it's apparently not. I got the drive lettering confused at the time. I have 5+ drives connected to this machine and they all have similar sizes. – new Objekt – 2017-02-01T19:55:17.540

Please update your question accordingly, then also DavidPostill's answer fits to the question. – mpy – 2017-02-01T20:49:54.733

@mpy Done. David's answer seems to be fine without any edits. – new Objekt – 2017-02-02T08:33:29.503

Answers

11

I did some tests and I'm not sure how bad this is

Short Answer:

Backup this drive and replace immediately.

Long Answer:

A company called Backblaze has collected data on hard drive failures. It has released that data in company blogs, highlighting which manufacturer's drives failed more often than others.

In a recent blog it published data indicating exactly which 5 SMART attributes indicate imminent drive failure:

From experience, we have found the following 5 SMART metrics indicate impending disk drive failure:

  • SMART 5 – Reallocated_Sector_Count.
  • SMART 187 – Reported_Uncorrectable_Errors.
  • SMART 188 – Command_Timeout.
  • SMART 197 – Current_Pending_Sector_Count.
  • SMART 198 – Offline_Uncorrectable.

We chose these 5 stats based on our experience and input from others in the industry because they are consistent across manufacturers and they are good predictors of failure.

The article goes on to suggest:

SMART 5: Reallocated_Sector_Count
1-4 keep an eye on it, more than 4 replace

SMART 187: Reported_Uncorrect
1 or more replace

SMART 188: Command_Timeout
1-13 keep an eye on it, more than 13 replace

SMART 197: Current_Pending_Sector_Count
1 or more replace

SMART 198: Offline_Uncorrectable
1 or more replace

In your case 5 (raw value 2), 197 (raw value 484) and 198 (raw value 371) are showing signs of failure.

DavidPostill

Posted 2017-01-26T10:58:34.770

Reputation: 118 938