So ZFS is reporting some "read issues", so it would seem that this disk is failing, based on the fact nothing given in the ZFS-8000-9P document reports has occurred we are aware of. These disks are fairly new, the only issue we had recently was a full ZFS.
The ZFS runs on top of a LSI MegaRAID 9271-8i, all disks run "raid 0" per disk. I am not very familiar with this raid card, so I found a script that returns data derived from the megacli command line tool. I added 1 drive to show the setup, they are all setup the same. (system disks are different)
zpool status output
pool: data
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://zfsonlinux.org/msg/ZFS-8000-9P
scan: none requested
config:
NAME STATE READ WRITE CKSUM
data ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
br0c2 ONLINE 0 0 0
br1c2 ONLINE 0 0 0
br2c2 ONLINE 0 0 0
br0c3 ONLINE 0 0 0
br1c3 ONLINE 0 0 0
br2c3 ONLINE 0 0 0
r2c1 ONLINE 0 0 0
r1c2 ONLINE 0 0 0
r5c3 ONLINE 0 0 0
sdb ONLINE 0 0 0
sdc ONLINE 0 0 0
sdd ONLINE 0 0 0
sde ONLINE 0 0 0
sdf ONLINE 0 0 0
sdg ONLINE 0 0 0
r3c1 ONLINE 0 0 0
r4c1 ONLINE 2 0 0
... cut raidz2-1 ...
errors: No known data errors
The output of LSI script
Virtual Drive: 32 (Target Id: 32)
Name :
RAID Level : Primary-0, Secondary-0, RAID Level Qualifier-0
Size : 3.637 TB
Sector Size : 512
Is VD emulated : No
Parity Size : 0
State : Optimal
Strip Size : 512 KB
Number Of Drives : 1
Span Depth : 1
Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy : Disk's Default
Encryption Type : None
PI type: No PI
Is VD Cached: No
The script doesn't report any faulty disk, nor does the raidcontroller mark the drive as faulty. I found some other topics zpool error that gave the advice to clear the error and run a scrub. Now my question is, when is the threshold to run a scrub, how long would this take (assuming this zfs raid will take a performance hit for running scrub) Also when this disk is really fautly, will hot-swapping initialize a "rebuild" ? All the disks are "Western Digital RE 4TB, SAS II, 32MB, 7200rpm, enterprise 24/7/365". Is there a system that will check for zfs errors, since this was just a routine manual check ?
zfs version : 0.6.4.1 zfsonlinux
I know 2 read errors are not allot, but i'd prefer to be replacing disks to early then to late.