20 Million Raw_Read_Error_Rate per minute

1

0

My computer is going a little heavy on the raw reads....

I decided to check the SMART status of my hard drive, and I saw it had 125239624 raw read errors. Just a minute later I checked again for comparison and I was up to 127315512.

Should I be concerned? This laptop (an HP-Pavilion) might still be under warranty... should I send it in?

This is the full output of smartctl -data -a /dev/sda:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_
FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   118   100   006    Pre-fail  Always       -
       193153912
  3 Spin_Up_Time            0x0023   099   099   000    Pre-fail  Always       -
       0
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -
       289
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -
       0
  7 Seek_Error_Rate         0x002f   076   060   030    Pre-fail  Always       -
       42002234
  9 Power_On_Hours          0x0032   098   098   000    Old_age   Always       -
       2039
 10 Spin_Retry_Count        0x0033   100   100   097    Pre-fail  Always       -
       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -
       285
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -
       0
184 End-to-End_Error        0x0033   100   100   097    Pre-fail  Always       -
       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -
       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -
       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -
       0
190 Airflow_Temperature_Cel 0x0022   059   052   045    Old_age   Always       -
       41 (Min/Max 20/42)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -
       20
192 Power-Off_Retract_Count 0x0022   100   100   000    Old_age   Always       -
       0
193 Load_Cycle_Count        0x0032   038   038   000    Old_age   Always       -
       125873
194 Temperature_Celsius     0x0022   041   048   000    Old_age   Always       -
       41 (0 17 0 0 0)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -
       0
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -
       0
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -
       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -
       0
254 Free_Fall_Sensor        0x0032   100   100   000    Old_age   Always       -
       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Questionmark

Posted 2015-02-13T13:00:56.663

Reputation: 421

can you post the report? the raw numbers are often misleading – meatspace – 2015-02-13T13:07:51.707

@meatspace everything else looks perfect... – Questionmark – 2015-02-13T13:09:12.167

@meatspace, added that... – Questionmark – 2015-02-13T13:12:09.643

You have multiple "pre-fail" conditions that isn't "perfect" – Ramhound – 2015-02-13T13:18:06.833

Answers

4

The SMART results format is kinda garbage for this reason (well, it's confusing, at least). Modern disks are so packed with data that the raw error rate is usually fairly high - after applying error correction, no problems arise with data access/reliability.

I would focus on the below:

196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always -0 197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always -0 198 Offline_Uncorrectable 0x0030 100 100 000 Old_age Offline -0

This is the number of sectors reallocated, waiting to be reallocated, and unable to be reallocated, respectively.

When the head hits a bad sector and reading fails, it becomes a Current_Pending_Sector. The next time you try to write to it, it either works (everything goes back to normal, and the sector is reallocated) or it fails again—If there is reallocation space available from the pool, it will be reallocated. (Reallocated_Event_Count + 1). If the pool is used up, the sector becomes Offline_Uncorrectable and no further read/writes are possible.

Since your drive is not having any issue with sectors, only the standard, modern, data-density Raw_Read Error_Rate, I think you are fine. Standard advice about having backups always applies, but not more so here than in any other case, I think.

meatspace

Posted 2015-02-13T13:00:56.663

Reputation: 1 093

Actually, 196 is the number of reallocation attempts, successful or not. I think you meant 5, Reallocated Sectors Count. – Pedro Werneck – 2015-02-21T15:24:18.183

0

When you have a datacenter with thousands of HDDs, and it's easier and cheaper to replace a bad one than deal with a catastrophic failure, some SMART stats can be used to predict failure. For domestic users it's not reliable enough and usually not worth it. Sometimes drivers die without any warning, and sometimes they survive for months or years despite critical conditions.

Right now I'm using a computer with an HDD that's more than 4 years old and has been working for months with this warning from smartctl:

...
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
...
5 Reallocated_Sector_Ct   0x0033   002   002   036    Pre-fail  Always   FAILING_NOW 4015
...

Statistically, this drive is already more than 21 times more likely to fail than yours, so, don't worry too much about it. Just keep your backups up to date, as you should be doing anyway.

Pedro Werneck

Posted 2015-02-13T13:00:56.663

Reputation: 147

0

The SMART attributes 1 Raw_Read_Error_Rate and 7 Seek_Error_Rate are NOT counters, they are error rates. Their raw values are not meaningful to us, are defined by the manufacturer. The Raw_Read_Error_Rate raw value is not reported for any hard disk except those made by Seagate, so you have a Seagate. The important number for it is the VALUE of 118, which you can consider as 118%, better than 100% good (it's a statistically relative value). You have nothing at all to worry about.

The Pre-fail flag just indicates which attributes are considered critical, for determining SMART PASS/FAIL status. If the WORST for a Pre-fail attribute reaches THRESH, then the drive is considered FAILED.

5 Reallocated_Sector_Ct is a critical attribute, 196 Reallocated_Event_Count is not.

RobJ

Posted 2015-02-13T13:00:56.663

Reputation: 56