mdadm: Reactivating RAID6 array after improper shutdown

0

I have a seven-disk RAID6 array on a file server. The server experienced a crash and a hard reset was required, resulting in an improper shutdown. Note that I/O activity was likely occurring on the array at the time, so I'm assuming the array needs to be checked to resolve any potential inconsistencies. I've rebooted the server and according to /proc/mdstat, the array is inactive, but all the drives show up in it with an (S) next to them. (Spares?) Anyway, what is the appropriate action to take in order to reactivate and check the array? (Of course, once the array itself is running and consistent again, I will check the file system on the array, but I just need help figuring out the right way to go about getting the array working again.)

/proc/mdstat

Personalities : 
md0 : inactive sdf1[4](S) sdh1[1](S) sdg1[3](S) sdc1[7](S) sdd1[8](S) sdb1[6](S) sda1[5](S)
      13674579968 blocks super 1.2

unused devices: <none>

mdadm --examine /dev/sd{a,b,c,d,f,g,h}1

/dev/sda1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 05e69c50:388afb83:1418f18e:a393cb21
           Name : dende:0  (local to host dende)
  Creation Time : Sat May 26 17:14:56 2012
     Raid Level : raid6
   Raid Devices : 7

 Avail Dev Size : 3907022848 (1863.01 GiB 2000.40 GB)
     Array Size : 9767554560 (9315.07 GiB 10001.98 GB)
  Used Dev Size : 3907021824 (1863.01 GiB 2000.40 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
   Unused Space : before=1968 sectors, after=1024 sectors
          State : active
    Device UUID : 14691340:39f90733:090f8d1d:992b9aa7

    Update Time : Sun Nov 23 18:20:24 2014
       Checksum : cd065d9e - correct
         Events : 63764

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 6
   Array State : .AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdb1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 05e69c50:388afb83:1418f18e:a393cb21
           Name : dende:0  (local to host dende)
  Creation Time : Sat May 26 17:14:56 2012
     Raid Level : raid6
   Raid Devices : 7

 Avail Dev Size : 3907022848 (1863.01 GiB 2000.40 GB)
     Array Size : 9767554560 (9315.07 GiB 10001.98 GB)
  Used Dev Size : 3907021824 (1863.01 GiB 2000.40 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
   Unused Space : before=1968 sectors, after=1024 sectors
          State : active
    Device UUID : 01c26529:75542d96:c966fe26:f580dcdf

    Update Time : Sun Nov 23 18:20:24 2014
       Checksum : 5b31bee5 - correct
         Events : 63764

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 5
   Array State : .AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdc1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 05e69c50:388afb83:1418f18e:a393cb21
           Name : dende:0  (local to host dende)
  Creation Time : Sat May 26 17:14:56 2012
     Raid Level : raid6
   Raid Devices : 7

 Avail Dev Size : 3907022848 (1863.01 GiB 2000.40 GB)
     Array Size : 9767554560 (9315.07 GiB 10001.98 GB)
  Used Dev Size : 3907021824 (1863.01 GiB 2000.40 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
   Unused Space : before=1968 sectors, after=1024 sectors
          State : active
    Device UUID : 24aa47a3:9f0a123e:f0ce78b2:774359bd

    Update Time : Sun Nov 23 18:20:24 2014
       Checksum : e5ef87dc - correct
         Events : 63764

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 4
   Array State : .AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdd1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 05e69c50:388afb83:1418f18e:a393cb21
           Name : dende:0  (local to host dende)
  Creation Time : Sat May 26 17:14:56 2012
     Raid Level : raid6
   Raid Devices : 7

 Avail Dev Size : 3907022848 (1863.01 GiB 2000.40 GB)
     Array Size : 9767554560 (9315.07 GiB 10001.98 GB)
  Used Dev Size : 3907021824 (1863.01 GiB 2000.40 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
   Unused Space : before=1968 sectors, after=1024 sectors
          State : active
    Device UUID : e2983a0c:0bc3b3d4:b8d018c7:fb547dff

    Update Time : Sun Nov 23 18:20:24 2014
       Checksum : 3c484254 - correct
         Events : 63764

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 3
   Array State : .AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdf1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 05e69c50:388afb83:1418f18e:a393cb21
           Name : dende:0  (local to host dende)
  Creation Time : Sat May 26 17:14:56 2012
     Raid Level : raid6
   Raid Devices : 7

 Avail Dev Size : 3907022848 (1863.01 GiB 2000.40 GB)
     Array Size : 9767554560 (9315.07 GiB 10001.98 GB)
  Used Dev Size : 3907021824 (1863.01 GiB 2000.40 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
   Unused Space : before=1968 sectors, after=1024 sectors
          State : active
    Device UUID : 82f35d80:31b62631:22102161:dda95f56

    Update Time : Sun Nov 23 18:18:13 2014
       Checksum : fdc823df - correct
         Events : 63764

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 0
   Array State : AAAAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdg1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 05e69c50:388afb83:1418f18e:a393cb21
           Name : dende:0  (local to host dende)
  Creation Time : Sat May 26 17:14:56 2012
     Raid Level : raid6
   Raid Devices : 7

 Avail Dev Size : 3907022848 (1863.01 GiB 2000.40 GB)
     Array Size : 9767554560 (9315.07 GiB 10001.98 GB)
  Used Dev Size : 3907021824 (1863.01 GiB 2000.40 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
   Unused Space : before=1968 sectors, after=1024 sectors
          State : active
    Device UUID : 2612c125:cb5d4712:4777122a:46b5e6c7

    Update Time : Sun Nov 23 18:20:24 2014
       Checksum : bec55d2b - correct
         Events : 63764

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 2
   Array State : .AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdh1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 05e69c50:388afb83:1418f18e:a393cb21
           Name : dende:0  (local to host dende)
  Creation Time : Sat May 26 17:14:56 2012
     Raid Level : raid6
   Raid Devices : 7

 Avail Dev Size : 3907022848 (1863.01 GiB 2000.40 GB)
     Array Size : 9767554560 (9315.07 GiB 10001.98 GB)
  Used Dev Size : 3907021824 (1863.01 GiB 2000.40 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
   Unused Space : before=1968 sectors, after=1024 sectors
          State : active
    Device UUID : 8cb08975:ff61e873:997d5d58:0559d0f9

    Update Time : Sun Nov 23 18:20:24 2014
       Checksum : d063a9d5 - correct
         Events : 63764

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 1
   Array State : .AAAAAA ('A' == active, '.' == missing, 'R' == replacing)

I see that /dev/sdf1 has a different array state than the other devices, which stands out to me as something that must be meaningful, but as to what that meaning is, I have no clue. I really appreciate any help you kind ladies and gentlemen can offer. :-)

EDIT: Tried the advice in the comment below about stopping the array and assembling. mdadm reports that it successfully started /dev/md0 with 7 drives, and according to /proc/mdstat, the array is now resyncing (which will obviously take a while). I'm guessing that means it saw one device was slightly out of date and is restoring it using the other devices? And does that mean things should be good on the RAID side now? (Still will be checking the file system before I do anything with the array.)

nonoitall

Posted 2014-11-24T07:04:56.923

Reputation: 185

Try mdadm --stop md0; then mdadm --assemble --uuid=05e69c50:388afb83:1418f18e:a393cb21 to see whether it will now assemble the array correctly, even though one component (sdf1) will probably be left out; the update time on that disk is lower than the rest. Show the output that this gives (edit your question to add it). EDIT: hmm, I just noticed your mdstat show no personalities; you probably need to modprobe raid456 – wurtel – 2014-11-24T15:59:05.603

Thanks for the tip. Tried your advice and posted the results above. – nonoitall – 2014-11-25T01:54:04.967

Did you do the modprobe raid456 or was that not necessary? Does /proc/mdstat now show personalities? – wurtel – 2014-11-25T07:48:05.380

Yes I did that before stopping the array and subsequently RAIDs 4, 5 and 6 appeared in personalities. – nonoitall – 2014-11-25T09:28:33.807

Answers

1

Apparently at the time the array got assembled the required RAID 6 personality was not yet available, due to the module raid456 not being loaded yet.

Stopping the incorrectly assembled array, loading the module, and assembling it again should help:

mdadm --stop md0
modprobe raid456
mdadm --assemble --uuid=05e69c50:388afb83:1418f18e:a393cb21

The UUID is what is listed as Array UUID in the mdadm --examine output.

Depending on how your system boots you probably need to ensure that the raid456 module is loaded before the md array is assembled.

Tip: on larger arrays it can be helpful to add a write-intent bitmap to the device so that after a failure like you experienced the entire array doesn't need to be resynced; just those parts that are out of date. You can do this with the following command:

mdadm --grow --bitmap internal /dev/md0

The bitmap can be specifies during the creation as well, the above command is to add a bitmap after the fact. A bitmap can be removed by specifying none instead of internal.

wurtel

Posted 2014-11-24T07:04:56.923

Reputation: 1 359

The system descended into sort of a rescue mode due to the array being unavailable, and I think that was the reason the module got unloaded. The array finished resyncing, I mounted it and ran a scrub (btrfs) and no errors came up, so I think all is well. Does the internal bitmap require more space? (If I simply add the bitmap with the command above, will the RAID device shrink slightly to make room for it?) I just want to make sure and shrink the underlying file system beforehand if that's necessary. – nonoitall – 2014-11-26T11:27:44.460

1The internal bitmap uses spare space that's already available, the array content is not touched. It's also possible to use a separate bitmap on another device but that doesn't give all that much advantage. Note also that such a bitmap is only useful on large arrays, as on smaller devices resyncing the whole array won't take much time and using a bitmap does incur a (quite small) overhead. – wurtel – 2014-11-26T11:58:03.127