Is my SMART test failing or not?

It shows that test ended with read failure but overall result is PASSED. So what is the state of this hard disk?

root@master:~# smartctl -i /dev/sda
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.13.0-24-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital RE4 (SATA 6Gb/s)
Device Model:     WDC WD2000FYYZ-01UL1B1
Serial Number:    WD-WMC1P0385424
LU WWN Device Id: 5 0014ee 0ae6ce8de
Firmware Version: 01.01K02
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sat Nov 25 02:04:28 2017 MSK
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

root@master:~# smartctl -H /dev/sda
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.13.0-24-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

root@master:~# smartctl -l selftest /dev/sda
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.13.0-24-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       90%     19675         69299
# 2  Short offline       Completed: read failure       90%     19675         52526
# 3  Short offline       Completed: read failure       90%     19675         52526
# 4  Short offline       Completed without error       00%      5505         -

Update:

root@master:~/chef-usability# smartctl -A /dev/sda
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.13.0-24-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       1
  3 Spin_Up_Time            0x0027   166   164   021    Pre-fail  Always       -       6658
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       17
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   074   073   000    Old_age   Always       -       19698
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       15
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       13
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       77
194 Temperature_Celsius     0x0022   122   107   000    Old_age   Always       -       28
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       6
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       6
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       13

Poma

Posted 2017-11-24T23:06:02.293

Reputation: 1 267

1I wouldn’t trust a drive that fails both the short and extended offline self-test for anything – Ramhound – 2017-11-24T23:11:33.283

Answers

Your log says that when you ran a short online test at the time your harddisk had run for 5505 hours in total, it passed completely.

Howevery, the three tests you ran when your harddisk had run for a total of 19675 hours all stopped after encountering a bad sector after reading 10% of all sectors. The first two tests found this bad sector at 52526, then it was probably re-allocated and the most recent test found a second bad sector at 69299.

There may be more bad sectors in the remaining 90%.

Now a few bad sectors are expected to appear over the lifetime of the harddisk (that's why the harddisk can reallocate them), but that's enough bad sectors in your case that I would have a close look at all SMART properties (smartctl -A) to see if I wanted to keep using that disk.

Edit

The properties look good (reading guidance: all value are normed at 100, lower is worse), though I'm a bit confused that Reallocated_Sector_Ct is (raw) zero. From the properties alone, the disk looks healthy. Next thing I'd try is to try to read both flagged sectors with dd (to another harddisk) using the indicated LBA, and write them back if you can read them, or write zeroes (/dev/zero) back if you can't read them. Then see if the short and/or extended test proceeds further.

dirkt

Posted 2017-11-24T23:06:02.293

Reputation: 11 627

added results of smartctl -A /dev/sda to my question – Poma – 2017-11-25T14:22:53.097

You have 6 sectors that are pending remapping. They will only be remapped, however, when written to.

So, the easiest “solution” is to just wipe the disk and then reinstall whatever was on it.

If there are only files, you need to move them elsewhere. This will fail for damaged files.

If you have software installed (like the operating system or applications) you don’t have to move away everything but the user data.

Instead of looking only at parts of smartctl’s information, please use -a. It also includes all the information you provided plus the error log.

Daniel B

Posted 2017-11-24T23:06:02.293

Reputation: 40 502

I recently had a HDD fail a SMART test on me. It did not even show up as PASSED, but said FAILED instead. The HDD tho kept working, so if yours says PASSED, it surely will be OK

Diego Sánchez Mairena

Posted 2017-11-24T23:06:02.293

Reputation: 1

1It was knocked out of mdadm array as failed though. But the one that is left in array (raid1) has the same stats. – Poma – 2017-11-25T00:22:44.567

1What does smartctl -A /dev/sda advise ? I'd imagine that the drive has not failed (in as much as it has not lost data), but it is on its way out, and should be replaced. (That drive has TLER, so it should not fall out the array). Although still available, I think the RE drives are no longer being made (they are now WD Gold drives), so this does indicate the drive might be getting on in years. – davidgo – 2017-11-25T04:52:35.940

added results of smartctl -A /dev/sda to my question – Poma – 2017-11-25T14:20:50.073