1
I have a system with 10 drives running Linux software RAID using RAID 6. Today the system stopped responding and needed to be hard power cycled. The filesystem on the RAID (note, not the root filesystem, that's on its own drive) is in tact and the data is still there. But I noticed during the boot sequence this:
raid5: raid level 6 set md0 active with 9 out of 10 devices, algorithm 2
RAID5 conf printout:
--- rd:10 wd:9
disk 0, o:1, dev:sdb1
disk 2, o:1, dev:sdc1
disk 3, o:1, dev:sdd1
disk 4, o:1, dev:sde1
disk 5, o:1, dev:sdj1
disk 6, o:1, dev:sdi1
disk 7, o:1, dev:sdh1
disk 8, o:1, dev:sdg1
disk 9, o:1, dev:sdf1
md0: detected capacity change from 0 to 16003169779712
The first part didn't surprise me, it just seemed that a drive dropped out. No big deal, RAID is designed to handle just that. But that last bit concerned me. I didn't like the term "capacity change" on my RAID.
As I said before, the filesystem is fine. No change from before:
Filesystem Type Size Used Avail Use% Mounted on
/dev/root ext4 73G 6.8G 63G 10% /
proc proc 0 0 0 - /proc
sysfs sysfs 0 0 0 - /sys
usbfs usbfs 0 0 0 - /proc/bus/usb
tmpfs tmpfs 1.7G 0 1.7G 0% /dev/shm
/dev/md0 xfs 15T 9.5T 5.2T 65% /mnt/data
But /proc/mdstat
says this:
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath]
md0 : active raid6 sdk1[10] sdi1[6] sdh1[7] sdg1[8] sdf1[9] sdj1[5] sdd1[3] sde1[4] sdb1[0] sdc1[2]
15628095488 blocks level 6, 64k chunk, algorithm 2 [10/9] [U_UUUUUUUU]
[>....................] recovery = 0.7% (15060864/1953511936) finish=2053.3min speed=15733K/sec
unused devices: <none>
Notice the [10/9]
. I've seen it say [9/10]
when a drive mistakenly dropped out before. And after re-syncing it went back to [10/10]
as expected. But does this mean something different? Is there something else that needs to be done besides just letting this finish? Has the RAID somehow changed its shape in some way?