/dev/md0 Lost a Drive?

1

I have a system with 10 drives running Linux software RAID using RAID 6. Today the system stopped responding and needed to be hard power cycled. The filesystem on the RAID (note, not the root filesystem, that's on its own drive) is in tact and the data is still there. But I noticed during the boot sequence this:

raid5: raid level 6 set md0 active with 9 out of 10 devices, algorithm 2
RAID5 conf printout:
 --- rd:10 wd:9
 disk 0, o:1, dev:sdb1
 disk 2, o:1, dev:sdc1
 disk 3, o:1, dev:sdd1
 disk 4, o:1, dev:sde1
 disk 5, o:1, dev:sdj1
 disk 6, o:1, dev:sdi1
 disk 7, o:1, dev:sdh1
 disk 8, o:1, dev:sdg1
 disk 9, o:1, dev:sdf1
md0: detected capacity change from 0 to 16003169779712

The first part didn't surprise me, it just seemed that a drive dropped out. No big deal, RAID is designed to handle just that. But that last bit concerned me. I didn't like the term "capacity change" on my RAID.

As I said before, the filesystem is fine. No change from before:

Filesystem    Type    Size  Used Avail Use% Mounted on
/dev/root     ext4     73G  6.8G   63G  10% /
proc          proc       0     0     0   -  /proc
sysfs        sysfs       0     0     0   -  /sys
usbfs        usbfs       0     0     0   -  /proc/bus/usb
tmpfs        tmpfs    1.7G     0  1.7G   0% /dev/shm
/dev/md0       xfs     15T  9.5T  5.2T  65% /mnt/data

But /proc/mdstat says this:

Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath] 
md0 : active raid6 sdk1[10] sdi1[6] sdh1[7] sdg1[8] sdf1[9] sdj1[5] sdd1[3] sde1[4] sdb1[0] sdc1[2]
      15628095488 blocks level 6, 64k chunk, algorithm 2 [10/9] [U_UUUUUUUU]
      [>....................]  recovery =  0.7% (15060864/1953511936) finish=2053.3min speed=15733K/sec

unused devices: <none>

Notice the [10/9]. I've seen it say [9/10] when a drive mistakenly dropped out before. And after re-syncing it went back to [10/10] as expected. But does this mean something different? Is there something else that needs to be done besides just letting this finish? Has the RAID somehow changed its shape in some way?

David

Posted 2011-04-22T00:31:34.580

Reputation: 748

Answers

3

You get the [10/9] because the disk which is in "syncing" state is counted as a "hot spare" drive. So there are 9 drives "active" and one is a spare: Ten drives in the array. Once the syncing process finishes, you get the [10/10] again as all are "active".

Turbo J

Posted 2011-04-22T00:31:34.580

Reputation: 1 919