10

On our FreeNAS server, zpool status gives me:

  pool: raid2
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
    attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
    using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: none requested
config:

    NAME                                            STATE     READ WRITE CKSUM
    raid2                                           ONLINE       0     0     0
      raidz1                                        ONLINE       0     0     0
        gptid/5f3c0517-3ff2-11e2-9437-f46d049aaeca  ONLINE       0     0     0
        gptid/5fe33556-3ff2-11e2-9437-f46d049aaeca  ONLINE       3 1.13M     0
        gptid/60570005-3ff2-11e2-9437-f46d049aaeca  ONLINE       0     0     0
        gptid/60ebeaa5-3ff2-11e2-9437-f46d049aaeca  ONLINE       0     0     0
        gptid/61925b86-3ff2-11e2-9437-f46d049aaeca  ONLINE       0     0     0

errors: No known data errors

What should I do? scrub the pool?

Dan
  • 939
  • 5
  • 14
  • 25

4 Answers4

9

Type zpool clear raid2 to clear the errors and initiate a scrub.

If the errors persist following that, replace the disk.

More details about the hardware would help, so this is generic advice. My recommendation for bunch of consumer disks connected to a PC motherboard are different than what I'd do for enterprise-level gear.

ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • uh oh ... after `zpool clear raid2`, `zpool status` gave `DEGRADED` and that disk is `UNAVAIL`. No point in scrubbing now, right? Need to replace disk? But ... not sure how to identify it. Is there a way to get serial number for `gptid/5fe33556-3ff2-11e2-9437-f46d049aaeca`? – Dan Apr 04 '14 at 18:55
  • zdb raid2, will give the GUID for the disk. But I don't think this will give out the serialnumber. – Andreas Mattisson Sep 03 '14 at 06:35
4

The tool tells you what you need to do: "Determine if the device needs to be replaced".

The tools are only so intelligent and need you, as the human administrator, to figure some things. The steps required are specific to your hardware and your set up, so you will need to make some decisions based on your knowledge of the system.

Take a look at the output from the command. It looks like device gptid/5fe33556-3ff2-11e2-9437-f46d049aaeca is experiencing 'WRITE' errors. '1.13M' is a very high error rate and I suspect the problem has been occurring for a while without you noticing. See if you can figure out why and then replace the disk.

If you have a hardware controller, that controller might have additional tools to help you determine the nature of the failure.

ZFS can deal with corrupt sectors, so there is no need to panic. But don't ignore the problem either.

As a preventative measure, you should also run a ZFS scrub regularly. See http://doc.freenas.org/index.php/ZFS_Scrubs . This will alert you when ZFS first encounters a problem, well before you hit the 1.13M mark.

Stefan Lasiewski
  • 22,949
  • 38
  • 129
  • 184
3

Use the following command change out /dev/adaX for your drives.

[blackout@freenas ~]# smartctl -a /dev/ada0 | grep "Serial"
Serial Number: WD-WCC4EXXXXXXXX
also a helpful commant [blackout@freenas ~]# glabel status

2

Although the question is old, it might be looked at by other people.

If so, remember, the output of zpool status and zpool status -v relate to all errors experienced. That includes errors due to your motherboard SATA ports (if used), the HBA card (if used), the SATA cables themselves..... not just the disks.

Three quick diagnostic tests are - check the disk quickly using smartctl, check the card is correctly seated and not loose, and try a different port or SATA cable (the cable is a common cause of read/write errors).

Stilez
  • 664
  • 6
  • 14