I am running e2fsck on one of my disk partitions (ext4) but it seems to take eternity. It is already running now for almost 10 hours or so and it is still at 42%. The size of the partition is around 800Gigs and overall disk size(on which the partition is) is around 1TB.

Running iostat shows the following output:

iostat -xzhcd  /dev/sdc 2 5
Linux 3.13.0-37-generic (divick-desktop)    Monday 03 April 2017    _x86_64_    (2 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           2.97    0.00    0.41   50.22    0.00   46.40

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
                 49.12     0.00    6.87    0.00   223.95     0.02    65.20     1.01  147.22  145.40 4611.03 143.47  98.57

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           4.25    0.00    9.63   71.67    0.00   14.45

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
                  0.00     0.00    1.50    0.00     6.00     0.00     8.00     1.00  592.00  592.00    0.00 665.33  99.80

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           2.71    0.00    6.63   59.34    0.00   31.33

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
                  0.00     0.00    1.50    0.00     6.00     0.00     8.00     1.00  592.00  592.00    0.00 666.67 100.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           3.76    0.00    9.25   56.94    0.00   30.06

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
                  0.00     0.00    3.50    0.00    14.00     0.00     8.00     1.00  508.00  508.00    0.00 285.71 100.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           3.39    0.00    7.63   73.73    0.00   15.25

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
                  0.00     0.00    1.50    0.00     6.00     0.00     8.00     1.00  593.33  593.33    0.00 666.67 100.00

Why does r_await times so high (~0.5 ms)? Is it a signal of the disk failing or is it because of something else?

Interpreting the result of running the smarttests on the disk, seems to be a bit confusing. I see the following lines in the smart test output:

SMART overall-health self-assessment test result: PASSED

But looking at the detailed output I see:

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
  1 Raw_Read_Error_Rate     0x002f   192   192   051    Pre-fail  Always       -       13824
  3 Spin_Up_Time            0x0027   119   111   021    Pre-fail  Always       -       7008
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       515
  5 Reallocated_Sector_Ct   0x0033   165   165   140    Pre-fail  Always       -       671
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   086   086   000    Old_age   Always       -       10561
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       511
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       182
193 Load_Cycle_Count        0x0032   128   128   000    Old_age   Always       -       218580
194 Temperature_Celsius     0x0022   101   080   000    Old_age   Always       -       46
196 Reallocated_Event_Count 0x0032   018   018   000    Old_age   Always       -       182
197 Current_Pending_Sector  0x0032   198   197   000    Old_age   Always       -       480
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       35
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       210

I am not clear if the disk is really failing.

  • 103
  • 5

3 Answers3


The listed SMART output seems to indicate a dying drive. Particularly:

197 Current_Pending_Sector  0x0032   198   197   000    Old_age   Always       -       480
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       35

When the "RAW_VALUE" of one or both of these 2 attributes is non-zero, I would recommend to immediately replace the drive.

  • 3,457
  • 1
  • 34
  • 29

From the SMART output's 13824 Raw_Read_Error_Rate makes it appears the drive is having failing read requests, which could cause the high r_await and iowait in the sar output. It is likely the drive is taking a long time with read requests, which then fail/abort after they timeout. I would also check the dmesg output for driver/device errors for further confirmation.


First, you should check whether the problem is caused by e2fsck or not. You can do this by running the top command.

Here's the man page for top.

  • 1
    What output from top do you think would help here? fsck doesn't show up in top %CPU utilization because it is waiting for IO. Instead running ps shows the status of fsck (D -- implies waiting for IO): D 3578 e2fsck -f -y -b 32768 /dev/ root But that doesn't reveal anything more than what I already logged in my original comment. – DivKis01 Apr 03 '17 at 01:28