input/output error at clean ext3 partition - how to check what wrong with data block

Question

I have a problem with a file at a ext3 partition on CentOS 5 server (kernel version 2.6.18-164.15.1.el5) with a HP Raid Controller:

hpacucli ctrl all show detail

Smart Array P410 in Slot 1
   Bus Interface: PCI
   ...

HP tool doesn't report any problems.

It's normal partition ext3 with the block size set to 2k, and it is fine - fsck output:

fsck 1.39 (29-May-2006)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information

File inode is OK as well:

File: `name.xxx'
Size: 3126962       Blocks: 6124       IO Block: 4096   regular file
Device: 6851h/26705d    Inode: 64579729    Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2014-07-28 09:02:59.000000000 -0400
Modify: 2014-07-28 09:02:59.000000000 -0400
Change: 2014-07-28 09:02:59.000000000 -0400

One of the operation I cannot performance is file copy:

> cp /long_path/name.xxx .
 cp: reading `/long_path.name.xxx': Input/output error

To pinpoint where is the problem I run dd to copy file:

> dd if=/long_path/name.xxx bs=2048 of=test
 dd: reading `/long_path/name.xxx': Input/output error
 222+0 records in
 222+0 records out
 454656 bytes (455 kB) copied, 0.042867 seconds, 10.6 MB/s

So I guess that problem is in the 223 file block.

Debugfs should help with locating that block on the disk

debugfs  -R "stat name.xxx" /dev/sdf
debugfs 1.39 (29-May-2006)
Inode: 64579729   Type: regular    Mode:  0644   Flags: 0x0   Generation: 2900468317
User:     0   Group:     0   Size: 3126962
File ACL: 0    Directory ACL: 0
Links: 1   Blockcount: 6124
Fragment:  Address: 0    Number: 0    Size: 0
ctime: 0x53d64a03 -- Mon Jul 28 09:02:59 2014
atime: 0x53d64a03 -- Mon Jul 28 09:02:59 2014
mtime: 0x53d64a03 -- Mon Jul 28 09:02:59 2014
BLOCKS:
(0):130402311, (1-4):130402844-130402847, (5-6):130484033-130484034, (7):130484036,
(8-10):130484049-130484051, (11):130484055, (IND):130761221, (12-13):130761222-130761223,   
(14):130763791, (15):130763942, (16):130765268, (17-23):130838937-130838943,  
(24-46):130853946-130853968, (47-48):130855126-130855127, (49):130855215, 
(50-53):130856428-130856431, (54-104):130856533-130856583, (105-341):130856748-130856984, 
...
[MORE BLOCKS]     
....
TOTAL: 1531

So I guess the problematic data are in the block 130856866.

How can I get more information about that block? I ran badblocks, and have a list of bad blocks. My guess is that I have to multiply above block number by 2 (file system block size is 2K and badblocks uses 1K by default). Also badblocks checks a disk, not a partition, so maybe I should to add some offset (there is one partition on that disk, so probably no).

> fdisk -l /dev/sdf

Disk /dev/sdf: 2000.3 GB, 2000365379584 bytes
255 heads, 63 sectors/track, 243197 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
       Device Boot      Start         End      Blocks   Id  System
/dev/cciss/c0d5p1   *       1      243197  1953479871   83  Linux

I also thought of using smartd. What should I look for?

Error counter log:
       Errors Corrected by           Total   Correction     Gigabytes    Total
           ECC          rereads/    errors   algorithm      processed    uncorrected
       fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0     1457         0  2887405961          0      65948.712          18
write:         0        0         0         0          0      15056.493           0
verify:        0        1         0  361901613          0       3591.720           0

Non-medium error count:      226

SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
   Description                              number   (hours)
# 1  Background long   Failed in segment -->       -   34479          16845361 [0x3 0x11 0x0]
# 2  Background short  Completed                   -      44                 - [-   -    -]
# 3  Background short  Completed                   -      39                 - [-   -    -]
# 4  Background long   Completed                   -       6                 - [-   -    -]

Long (extended) Self Test duration: 18500 seconds [308.3 minutes]

Background scan results log
Status: scan is active
  Accumulated power on time, hours:minutes 34541:56 [2072516 minutes]
  Number of background scans performed: 1139,  scan progress: 38.18%
  Number of background medium scans performed: 1139

 #  when        lba(hex)    [sk,asc,ascq]    reassign_status
 1 19215:06  0000000000014c61  [3,11,0]   Recovered via rewrite in-place
 2 19215:07  0000000000014c66  [3,11,0]   Recovered via rewrite in-place
 3 19413:28  0000000001010a31  [3,11,0]   Require Write or Reassign Blocks command
 4 19943:24  000000000001ea99  [3,11,0]   Recovered via rewrite in-place
 5 20152:23  00000000000232b8  [3,11,0]   Recovered via rewrite in-place
 6 31229:34  810000004087f984  [3,11,0]   Require Write or Reassign Blocks command
 7 33021:51  810000004087ba85  [3,11,0]   Require Write or Reassign Blocks command
 8 33021:51  000000004087ba9f  [3,11,0]   Require Write or Reassign Blocks command
 9 33021:52  000000004087bad6  [3,11,0]   Require Write or Reassign Blocks command
10 33029:43  000000004087baa5  [3,11,0]   Require Write or Reassign Blocks command
11 33055:27  000000004087bac3  [3,11,0]   Require Write or Reassign Blocks command
12 33244:40  810000004087f9d6  [3,11,0]   Require Write or Reassign Blocks command
13 33431:58  990000004087f105  [0,0,0]   Reassignment by disk failed
14 33480:17  00000000463d7713  [3,11,0]   Require Write or Reassign Blocks command
15 33480:19  00000000463d7723  [3,11,0]   Require Write or Reassign Blocks command
16 33480:20  00000000463d7725  [3,11,0]   Require Write or Reassign Blocks command
17 33480:28  81000000463d774e  [3,11,0]   Require Write or Reassign Blocks command
18 33686:17  8100000044e50edc  [3,11,0]   Require Write or Reassign Blocks command
19 34154:17  81000000432bef27  [3,11,0]   Require Write or Reassign Blocks command
20 34463:43  810000001f32decd  [3,11,0]   Require Write or Reassign Blocks command
21 34463:43  0000000030080000  [3,11,0]   Require Write or Reassign Blocks command

How should I marry above smartctl output (or any other output from smartd run) with my initial problem.

Also shouldn't such issue be address by HDD software?

BTW. I found following link useful to understand 'debugs -R' output. Maybe that link will be useful for one else.

UPDATE

Doing further research I found that action related to problematic inodes (like above cp command) trigger following line in kernel log:

kernel: cciss: cmd ffff810037e00000 has CHECK CONDITION sense key = 0x3

'sense key' is a 'status' and part of SCSI standard (list here and more description here).

Following @Peter answer I noticed following lines in kernel log: `Buffer I/O error on device sdf, logical block 147304164`, but not after my copy attempt. On the hand following line seems to coincident with my read try: `cciss: cmd ffff810037e00000 has CHECK CONDITION sense key = 0x3` — Wawrzek, Aug 01 '14 at 14:29
Below or up to that there is the actual sector problem as well, but this is what you really need. — peterh, Aug 01 '14 at 14:34
Nothing interesting there: `2014-07-29T18:28:45.091544-04:00 host kernel: Buffer I/O error on device cciss/c0d5, logical block 135329616 2014-07-29T18:28:45.152892-04:00 host kernel: cciss: cmd ffff810037e00000 has CHECK CONDITION sense key = 0x3 2014-07-29T18:28:45.153056-04:00 host kernel: Buffer I/O error on device cciss/c0d5, logical block 135329616 2014-07-29T18:28:45.194790-04:00 host kernel: cciss: cmd ffff810037e00000 has CHECK CONDITION sense key = 0x3 2014-07-29T18:28:45.194879-04:00 host kernel: Buffer I/O error on device cciss/c0d5, logical block 135329616` — Wawrzek, Aug 01 '14 at 14:38
You had to get filesystem errors as well. These seem to be block errors. A solution were, if you run `e2fsck -c -f -y -C0` for the weekend (or at least for the night) in a loop. This calls the `badblocks` tool, and calculates the sector -> filesystemblock conversion automatically. — peterh, Aug 01 '14 at 14:52
Another help: your fs has 4k blocks (nearly surely), and begins on sector 1... thus a sector number -> logical block number conversion is very simple: (n-1)/4 (rounded always down). — peterh, Aug 01 '14 at 14:53
As I wrote in the main message fs at 2k blocks. Also I have a list of badblocks from previous run and following command (bad_block-1)/4+1=logical_bad_block seems to work (not sure about that +1). The list of kernel is subset of block from badblock - what make sense. — Wawrzek, Aug 01 '14 at 15:22
Uhm, sorry: in this case, 1 fs-block is 4 lowlevel block, thus the (sector_no-1)/4 stays. If you had 4k block fs, (sector_no-1)/8 were the correct formula. — peterh, Aug 01 '14 at 15:23

ewwhite · Answer 1 · 2014-08-08T19:35:21.050

3

That's a lot to process... But a few things jump out at me.

Your kernel version is: 2.6.18-164.15.1.el5 - That places your kernel revision at the EL5.4 level, or circa March 2010.

I had persistent ext3 filesystem stability and corruption issues in EL5. Things weren't fully fixed until mid-2012. In my worst situation, I was working with a cloud infrastructure firm that never updated kernels from their base release. So I began to see these problems at scale across thousands of EL5 servers.

Is there any chance you can update your OS/kernel/e2fsprogs, fsck and try again?

In addition, if the kernel is circa 2010, your system's BIOS and Smart Array P410 firmware is probably very outdated. What model server is this?

Edit:

Your cciss CHECK_CONDITION errors are the giveaway. No need to even deal with SMART at this point. Run the HP Array Diagnostic Utility and it will distill the error information into a report. Either way, I really hope this isn't a RAID5 array.

Can you post the output of hpacucli ctrl all show config detail ?

edited Aug 08 '14 at 19:35

answered Aug 01 '14 at 15:06

ewwhite

194,921
91
434
799

It doesn't seems ext problem, there are lowlevel disk read problems. – peterh Aug 01 '14 at 15:29
The hardware is old and in process of refresh, there is also process of upgrade to CentOS6. That actual problem is not so big. For that disk we will replace it and refresh data. My investigation is more to establish rule, so we can cope with these kind of issues more actively. (Un)Luckily I see similar issues in 'few' other places and I think I should be able to compare different kernel/OS combination. – Wawrzek Aug 01 '14 at 15:30
Rule out RAID controller and hardware issues. What specific server and controller firmware are you on? If you have `hpacucli` installed, post the output of `hpacucli ctrl all show config detail` – ewwhite Aug 01 '14 at 15:32
hpacucli doesn't show any issues. I tried it. The version `hpacucli version ACU CLI Version: 9.0-24.0 SoulAPI Version: 5.2.102.0` – Wawrzek Aug 01 '14 at 15:37
@Wawrzek No, I want your CONTROLLER's firmware revision! – ewwhite Aug 01 '14 at 15:39
`Firmware Version: 5.14` in this case, but I see the same problem on another machine with `Firmware Version: 7.22`. – Wawrzek Aug 01 '14 at 15:50
@Wawrzek And the same kernel revision on the box with firmware 7.22? – ewwhite Aug 01 '14 at 15:52
No, that one is on older version: 2.6.18-128.1.10.el5 and it different hardware: "P400" rather than "P410". – Wawrzek Aug 01 '14 at 15:55
But in the end, you have a storage controller/array error, right? – ewwhite Aug 15 '14 at 15:04

score 3 · Accepted Answer · answered Aug 08 '14 at 19:18

So, to figure this out I did the following.

Take your block number, multiply by four and add one

(130856866 * 4) + 1 = 523427465

This represents the sector reported as producing an I/O error. The block size being 2k, sectors being 512 bytes. The additional one extra accounts for the starting sector offset for the partition.

To correlate with SMART, we'll need to convert the value we now have into hexadecimal.

$ printf "0x%x\n" 523427465
0x1f32de89

Now, when you correlate that with what SMART shows, a line comes up suspiciously close..

20 34463:43  810000001f32decd  [3,11,0]   Require Write or Reassign Blocks command

How far away?

$ bc -l
bc 1.06.95
Copyright 1991-1994, 1997, 1998, 2000, 2004, 2006 Free Software Foundation, Inc.
This is free software with ABSOLUTELY NO WARRANTY.
For details type `warranty'. 
obase=16
ibase=16
1F32DECD-1F32DE89
44

That works out as being only between 34816 and 32768 bytes away but we cant say which sector is damaged out of the four that comprise the block.

If I had to hazard a guess, I'd say that theres probably a whole slew of blocks around the same address which will report I/O errors (assuming the raid striping is say 32k in size or whatever).

Additionally a read may not pick up the problem if the RAID is fetching the block chunk from a different disk. A write must propagate to all the disks in a RAID1 setup anyhow so this could make writes fail but reads succeed. Additionally, if we assume that the chunk size of the RAID card is 32k we can also assume that the damaged block plus the one reported by SMART are both damaged by whatever happened on that platter. Its just the SMART test read from the good disk for the first 32k and the bad disk for the next 32k.

Modern hard disks keep 'reserve sectors' to replace damaged sectors like this with a new sector location. Seeing as you are now getting this, and the that Reassign by disk failed message from smart I'd say a disk has ran out.

In terms of doing something about it; thats a bit more trickier. The LBA addressing is an abstraction against the real disk underneath. You'd need to identify which disk it is that is causing this issue, fail it in the RAID array and replace it.

In any case, you have a bad disk and you should look to replace it ASAP.

peterh · Answer 3 · 2014-08-01T15:28:51.817

The actually failed blocks are readable from the kernel log, which you can read somewhere below /var/log (probably /var/log/kernel.log), or from the output of the dmesg command.

Beware: what you need is not the disk sector number, but the partition- and filesystem-specific block number. Kernels since around 2.4.x are saying both of them in the dmesg.

Giving an -L flag to e2fsck can give this list of blocks to the bad blocks list of the filesystem. Thus the correct steps are the following:

First, check the list of the bad blocks from a dmesg.

Second, put them into a simple text file, so

cat >badblockfile.txt
34252345
3452345
23452345

(ctrl/d)

e2fsck -f -y -C0 /dev/diskname -L badblockfile.txt

If you can't interpret dmesg, put the relevant parts here as comments or as extension of your question.

EXTENSION

Your filesystem has 2k-blocks, and starts on the first sector of your hard disk (which has 512byte-sectors). Thus the formula between the filesystem-blocks (which could be given to e2fsck), and the disk-block (which are in the dmesg output) if very simple:

filesystem_block=(serctor_no-1)/4

If you don't have filesystem-level blocks in your messages, you could use this formula as well.

ALTERNATE TIP

There is also an additional tip: e2fsck has a flag -c. This calls the tool badblocks before a check, and marks the newly found bad blocks as bad. As I experienced, it doesn't really okay, in most cases it can't find all of the bad blocks. In your place I runned this for the weekend (or at least for the night) in an infinite loop:

while true; do e2fsck -f -y -C0 -c /dev/sdf;done

input/output error at clean ext3 partition - how to check what wrong with data block

3 Answers3