I've got a pool in raidz1-0
with 5 drives in it. I'm not sure exactly when, but all of the sudden all the drives went from always being ONLINE
with no read, write or checksum errors to randomly spitting out all sort of issues.
NAME STATE READ WRITE CKSUM
Data DEGRADED 0 0 0
raidz1-0 DEGRADED 149 185 0
gptid/905fe084-a003-11e9-9d12-000c29c8a62a DEGRADED 57 127 5 too many errors
gptid/2b75693a-9f09-11e9-8310-000c29c8a62a ONLINE 7 5 5
gptid/b8b4dd8f-82e9-11eb-b23f-000c29c8a62a DEGRADED 70 171 5 too many errors
gptid/b88beac0-e1f3-11e7-aeb0-000c29c8a62a DEGRADED 51 6 14 too many errors
gptid/4eb702b3-e2c3-11e7-9896-000c29c8a62a FAULTED 8 13 2 too many errors
I've done some basic troubleshooting:
- SMART shows that everything is fine (apart from some warmer than I'd like temps around the 40C range). So the drives look like they're in good shape. No bad sectors, no pending sectors, nothing out of the ordinary. All of the drives have been spinning for ~3 years at this point.
- Each of the drives are connected directly to the motherboard via individual SATA connections. I've reseated and replaced the SATA cables with no success.
At some point in time, I replaced the 3rd disk in the pool. At the time, it was spitting out the most errors and could always be the first to go into a DEGRADED state. I replaced it with a brand new drive and it's been running for months now, immediately picking up the same issue as the rest of the pool.
Even after a zpool clear
, about 5 hours later I had the following status.
NAME STATE READ WRITE CKSUM
Data DEGRADED 0 0 0
raidz1-0 DEGRADED 1 0 0
gptid/905fe084-a003-11e9-9d12-000c29c8a62a ONLINE 2 4 0
gptid/2b75693a-9f09-11e9-8310-000c29c8a62a ONLINE 0 0 0
gptid/b8b4dd8f-82e9-11eb-b23f-000c29c8a62a FAULTED 1 11 0 too many errors
gptid/b88beac0-e1f3-11e7-aeb0-000c29c8a62a ONLINE 1 1 0
gptid/4eb702b3-e2c3-11e7-9896-000c29c8a62a ONLINE 1 6 0
I'm not exactly sure what's going on here or where else to look.
I don't know if it's a coincidence, but I noticed this started to happen after upgrading the ZFS pool as part of one of FreeNAS's updates (I think it was 11.2U - also yeah, I'm running FreeNAS)
The only last thing I can possibly think of is a bad SATA controller. But before I get to that, is there anything else I can troubleshoot? This is for a hobby home server and replacing the controller essentially means a whole new server so I'd like to avoid that if possible. And there aren't any PCIe ports remaining to install an external controller unfortunately.
Thanks in advance!