0

I've the following WD drive (3TB) that gave me a problem (I was unable to access any file: even an ls command on it caused a never ending wait).

Here some details on the disk:

Disk /dev/sda: 2.7 TiB, 3000592982016 bytes, 5860533168 sectors
Disk model: EZRX-00D8PB0
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt

Device     Start        End    Sectors  Size Type
/dev/sda1   2048 5860532223 5860530176  2.7T Linux filesystem

After this problem I run some test to discover what kind of problem is affecting it. As first step I run a short test on it sudo smartctl -t short /dev/sda that shown me the following error:

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed: read failure       90%     17480         8467144

Then I tried to get some attributes as described in this other post Understanding smartctl -a output using sudo smartctl -a /dev/sda. Here you can find the attribute table and the 5 most recent error log:

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       71
  3 Spin_Up_Time            0x0027   174   161   021    Pre-fail  Always       -       6266
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       695
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   077   077   000    Old_age   Always       -       17481
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       457
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       63
193 Load_Cycle_Count        0x0032   179   179   000    Old_age   Always       -       64193
194 Temperature_Celsius     0x0022   122   101   000    Old_age   Always       -       28
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   197   000    Old_age   Always       -       356
198 Offline_Uncorrectable   0x0030   197   197   000    Old_age   Offline      -       1691
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   196   196   000    Old_age   Offline      -       1691

SMART Error Log Version: 1
ATA Error Count: 47 (device log contains only the most recent five errors)
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 47 occurred at disk power-on lifetime: 232 hours (9 days + 16 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 61 0a 00 00 00 00

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  e0 00 0a 00 00 00 00 00      04:00:17.522  STANDBY IMMEDIATE
  ef 03 46 00 00 00 a0 00      04:00:16.815  SET FEATURES [Set transfer mode]
  ec 00 00 00 00 00 a0 00      04:00:16.815  IDENTIFY DEVICE

Error 46 occurred at disk power-on lifetime: 232 hours (9 days + 16 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 61 46 00 00 00 a0  Device Fault; Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ef 03 46 00 00 00 a0 00      04:00:16.815  SET FEATURES [Set transfer mode]
  ec 00 00 00 00 00 a0 00      04:00:16.815  IDENTIFY DEVICE
  e1 00 0f 00 00 00 00 00      04:00:15.095  IDLE IMMEDIATE
  ef 03 46 00 00 00 a0 00      04:00:14.575  SET FEATURES [Set transfer mode]
  ec 00 00 00 00 00 a0 00      04:00:14.575  IDENTIFY DEVICE

Error 45 occurred at disk power-on lifetime: 232 hours (9 days + 16 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 61 0f 00 00 00 00

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  e1 00 0f 00 00 00 00 00      04:00:15.095  IDLE IMMEDIATE
  ef 03 46 00 00 00 a0 00      04:00:14.575  SET FEATURES [Set transfer mode]
  ec 00 00 00 00 00 a0 00      04:00:14.575  IDENTIFY DEVICE

Error 44 occurred at disk power-on lifetime: 232 hours (9 days + 16 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 61 46 00 00 00 a0  Device Fault; Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ef 03 46 00 00 00 a0 00      04:00:14.575  SET FEATURES [Set transfer mode]
  ec 00 00 00 00 00 a0 00      04:00:14.575  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 00      04:00:12.170  SET FEATURES [Set transfer mode]

Error 43 occurred at disk power-on lifetime: 232 hours (9 days + 16 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 61 46 00 00 00 a0  Device Fault; Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ef 03 46 00 00 00 a0 00      04:00:12.170  SET FEATURES [Set transfer mode]
  ec 00 00 00 00 00 a0 00      04:00:12.170  IDENTIFY DEVICE
  e1 00 0f 00 00 00 00 00      04:00:10.445  IDLE IMMEDIATE
  ef 03 46 00 00 00 a0 00      04:00:09.925  SET FEATURES [Set transfer mode]
  ec 00 00 00 00 00 a0 00      04:00:09.925  IDENTIFY DEVICE

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed: read failure       90%     17480         8467144

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Then I tried to inspect on the LBA_of_first_error (8467144) and, following a part of this guide, I run sudo sg_verify --lba=8467144 /dev/sda obtaining the following output that confirms me that there is a hardware failure:

verify(10):
Fixed format, current; Sense key: Medium Error
Additional sense: Id CRC or ECC error
VERIFY(10) medium or hardware error near lba=0x8132c8

As final step I tried to reassign the block without success sudo sg_reassign --address=8467144 /dev/sda:

REASSIGN BLOCKS: Illegal request, Invalid opcode
sg_reassign failed: Illegal request, Invalid opcode

So, to summarize, did I miss some step on this disk investigation? Is my drive dead or can still be used? I am not able to understand if there are some bad error form the SMART Attribute list, can you help me understanding if the drive have further errors?

Timmy
  • 101
  • 1

0 Answers0