0

I've had my trusty QNAP NAS (T869-RU) fail on me over night with (apparently) two of the eight disks in the RAID5 MDADM array

from /proc/mdstat:

[/etc] # cat /proc/mdstat 
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath] 
md0 : inactive sdb3[1](S) sdh3[7](S) sdg3[6](S) sdf3[5](S) sde3[4](S) sdd3[3](S) sdc3[2](S) sda3[0](S)
      15615564800 blocks

md8 : active raid1 sdh2[2](S) sdf2[3](S) sde2[4](S) sdd2[5](S) sdc2[1] sdb2[0]
      530048 blocks [2/2] [UU]

md13 : active raid1 sda4[0] sdd4[7] sde4[6] sdf4[5] sdg4[4] sdh4[3] sdc4[2] sdb4[1]
      458880 blocks [8/8] [UUUUUUUU]
      bitmap: 0/57 pages [0KB], 4KB chunk

md9 : active raid1 sdb1[1] sdh1[7] sdg1[6] sdf1[5] sde1[4] sdd1[3] sdc1[2]
      530048 blocks [8/7] [_UUUUUUU]
      bitmap: 65/65 pages [260KB], 4KB chunk

from dmesg:

[  329.450972] md: md0 stopped.
[  329.652404] md: bind<sda3>
[  329.653837] md: bind<sdc3>
[  329.655223] md: bind<sdd3>
[  329.656624] md: bind<sde3>
[  329.658021] md: bind<sdf3>
[  329.659409] md: bind<sdg3>
[  329.661720] md: bind<sdh3>
[  329.662977] md: bind<sdb3>
[  329.664083] md: kicking non-fresh sdg3 from array!
[  329.665110] md: bind<sde2>
[  329.665113] md: unbind<sdg3>
[  329.670009] md: export_rdev(sdg3)
[  329.671043] md: kicking non-fresh sda3 from array!
[  329.671981] md: unbind<sda3>
[  329.678009] md: export_rdev(sda3)
[  329.679473] md/raid:md0: not clean -- starting background reconstruction
[  329.680449] md/raid:md0: device sdb3 operational as raid disk 1
[  329.681408] md/raid:md0: device sdh3 operational as raid disk 7
[  329.682325] md/raid:md0: device sdf3 operational as raid disk 5
[  329.683219] md/raid:md0: device sde3 operational as raid disk 4
[  329.684085] md/raid:md0: device sdd3 operational as raid disk 3
[  329.684939] md/raid:md0: device sdc3 operational as raid disk 2
[  329.695598] md/raid:md0: allocated 136320kB
[  329.696599] md/raid:md0: not enough operational devices (2/8 failed)
[  329.697497] RAID conf printout:
[  329.697499]  --- level:5 rd:8 wd:6
[  329.697504]  disk 1, o:1, dev:sdb3
[  329.697507]  disk 2, o:1, dev:sdc3
[  329.697510]  disk 3, o:1, dev:sdd3
[  329.697514]  disk 4, o:1, dev:sde3
[  329.697516]  disk 5, o:1, dev:sdf3
[  329.697519]  disk 7, o:1, dev:sdh3
[  329.706658] md/raid:md0: failed to run raid set.
[  329.707554] md: pers->run() failed ...
[  330.713729] md: md0 stopped.
[  330.714629] md: unbind<sdb3>
[  330.719018] md: export_rdev(sdb3)
[  330.720035] md: unbind<sdh3>
[  330.724009] md: export_rdev(sdh3)
[  330.724860] md: unbind<sdf3>
[  330.736010] md: export_rdev(sdf3)
[  330.736851] md: unbind<sde3>
[  330.744010] md: export_rdev(sde3)
[  330.744792] md: unbind<sdd3>
[  330.752010] md: export_rdev(sdd3)
[  330.752760] md: unbind<sdc3>
[  330.760010] md: export_rdev(sdc3)
[  331.718141] md: bind<sdf2>
[  332.912023] md: md0 stopped.
[  333.932428] md: bind<sdh2>
[  336.469591] md: md0 stopped.
[  338.488805] md: md0 stopped.
[  338.713968] md: md8: recovery done.

All drives report themselves as clean despite some being removed due to being unhealthy, i.e:

[/etc] # mdadm -E /dev/sda3
/dev/sda3:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : eac0dd4a:cf58c4fd:29cb9955:7f700b4b
  Creation Time : Wed Dec  1 12:09:20 2010
     Raid Level : raid5
  Used Dev Size : 1951945600 (1861.52 GiB 1998.79 GB)
     Array Size : 13663619200 (13030.64 GiB 13991.55 GB)
   Raid Devices : 8
  Total Devices : 8
Preferred Minor : 0

    Update Time : Sun Jan 20 03:00:38 2013
          State : clean
 Active Devices : 8
Working Devices : 8
 Failed Devices : 0
  Spare Devices : 0
       Checksum : 1ecb8590 - correct
         Events : 0.40

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     0       8        3        0      active sync   /dev/sda3

   0     0       8        3        0      active sync   /dev/sda3
   1     1       8       19        1      active sync   /dev/sdb3
   2     2       8       35        2      active sync   /dev/sdc3
   3     3       8       51        3      active sync   /dev/sdd3
   4     4       8       67        4      active sync   /dev/sde3
   5     5       8       83        5      active sync   /dev/sdf3
   6     6       8       99        6      active sync   /dev/sdg3
   7     7       8      115        7      active sync   /dev/sdh3

Any ideas on what I can do here? I'd have assumed all was lost, given that two drives apparently aren't working, but they also appear clean/healthy so many something else has happened.

It seems the web interface (which shows drive health and the RAID control panel) must be driven by the main array (/dev/md0) that's down as I can't get to anything except for SSH on the QNAP server

Thanks

kwiksand
  • 463
  • 1
  • 8
  • 16

1 Answers1

1

Disregard this question.

Ended up talking to QNAP support who identified sda as being a problem drive, and one other drive was marked as being unclean but was fine. Am currently running it from 7 out of the 8 drives awaiting delivery of two replacements.

kwiksand
  • 463
  • 1
  • 8
  • 16