Synology has a customized version the md driver and mdadm toolsets that adds a 'DriveError' flag to the rdev->flags structure in the kernel.
Net effect - if you are unfortunate enough to get a array failure (first drive), combined with an error on a second drive - the array gets into the state of not letting you repair/reconstruct the array even though reads from the drive are working fine.
At this point, I'm not really worried about this question from the point of view of THIS array, since I've already pulled content off and am intending to reconstruct, but more from wanting to have a resolution path for this in the future, since it's the second time I've been bit by it, and I know I've seen others asking similar questions in forums.
Synology support has been less than helpful (and mostly non-responsive), and won't share any information AT ALL on dealing with the raidsets on the box.
Contents of /proc/mdstat:
ds1512-ent> cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md2 : active raid5 sdb5[1] sda5[5](S) sde5[4](E) sdd5[3] sdc5[2]
11702126592 blocks super 1.2 level 5, 64k chunk, algorithm 2 [5/4] [_UUUE]
md1 : active raid1 sdb2[1] sdd2[3] sdc2[2] sde2[4] sda2[0]
2097088 blocks [5/5] [UUUUU]
md0 : active raid1 sdb1[1] sdd1[3] sdc1[2] sde1[4] sda1[0]
2490176 blocks [5/5] [UUUUU]
unused devices: <none>
Status from an mdadm --detail /dev/md2:
/dev/md2:
Version : 1.2
Creation Time : Tue Aug 7 18:51:30 2012
Raid Level : raid5
Array Size : 11702126592 (11160.02 GiB 11982.98 GB)
Used Dev Size : 2925531648 (2790.00 GiB 2995.74 GB)
Raid Devices : 5
Total Devices : 5
Persistence : Superblock is persistent
Update Time : Fri Jan 17 20:48:12 2014
State : clean, degraded
Active Devices : 4
Working Devices : 5
Failed Devices : 0
Spare Devices : 1
Layout : left-symmetric
Chunk Size : 64K
Name : MyStorage:2
UUID : cbfdc4d8:3b78a6dd:49991e1a:2c2dc81f
Events : 427234
Number Major Minor RaidDevice State
0 0 0 0 removed
1 8 21 1 active sync /dev/sdb5
2 8 37 2 active sync /dev/sdc5
3 8 53 3 active sync /dev/sdd5
4 8 69 4 active sync /dev/sde5
5 8 5 - spare /dev/sda5
As you can see - /dev/sda5 has been re-added to the array. (It was the drive that outright failed) - but even though md sees the drive as a spare, it won't rebuild to it. /dev/sde5 in this case is the problem drive with the (E) DiskError state.
I have tried stopping the md device, running force reassembles, removing/readding sda5 from the device/etc. No change in behavior.
I was able to completely recreate the array with the following command:
mdadm --stop /dev/md2
mdadm --verbose \
--create /dev/md2 --chunk=64 --level=5 \
--raid-devices=5 missing /dev/sdb5 /dev/sdc5 /dev/sdd5 /dev/sde5
which brought the array back to this state:
md2 : active raid5 sde5[4] sdd5[3] sdc5[2] sdb5[1]
11702126592 blocks super 1.2 level 5, 64k chunk, algorithm 2 [5/4] [_UUUU]
I then re-added /dev/sda5:
mdadm --manage /dev/md2 --add /dev/sda5
after which it started a rebuild:
md2 : active raid5 sda5[5] sde5[4] sdd5[3] sdc5[2] sdb5[1]
11702126592 blocks super 1.2 level 5, 64k chunk, algorithm 2 [5/4] [_UUUU]
[>....................] recovery = 0.1% (4569508/2925531648) finish=908.3min speed=53595K/sec
Note the position of the "missing" drive matching the exact position of the missing slot.
Once this finishes, I think I'll probably pull the questionable drive and have it rebuild again.
I am looking for any suggestions as to whether there is any "less scary" way to do this repair - or if anyone has gone through this experience with a Synology array and knows how to force it to rebuild other than taking the md device offline and recreating the array from scratch.