How do I determine the failed/removed HDD in mdadm raid?

Question

My current mdstat:

$ cat /proc/mdstat 
Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10] 
md0 : active raid6 sde[8] sdh[4] sdg[1] sdd[6] sdb[5] sdc[7]
      9766914560 blocks super 1.2 level 6, 512k chunk, algorithm 2 [7/6] [UUUUU_U]

unused devices: <none>

Here is mdadm --detail:

$ sudo mdadm --detail /dev/md0
/dev/md0:
        Version : 1.2
  Creation Time : Fri Apr 26 21:52:21 2013
     Raid Level : raid6
     Array Size : 9766914560 (9314.46 GiB 10001.32 GB)
  Used Dev Size : 1953382912 (1862.89 GiB 2000.26 GB)
   Raid Devices : 7
  Total Devices : 6
    Persistence : Superblock is persistent

    Update Time : Tue Mar 28 15:19:34 2017
          State : clean, degraded 
 Active Devices : 6
Working Devices : 6
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 512K

           Name : server:0  (local to host server)
           UUID : 7dfb32ef:8454e49b:ec03ac98:cdb2e691
         Events : 34230

    Number   Major   Minor   RaidDevice State
       8       8       64        0      active sync   /dev/sde
       1       8       96        1      active sync   /dev/sdg
       4       8      112        2      active sync   /dev/sdh
       5       8       16        3      active sync   /dev/sdb
       6       8       48        4      active sync   /dev/sdd
      10       0        0       10      removed
       7       8       32        6      active sync   /dev/sdc

My questions are:

How am I supposed to figure out the removed HDD? Without tricks and guesses like subtracting the set of disks shown in mdadm output from all available HDDs in my system (ls /dev/sd*), etc....
Why mdadm could remove the disk? Is it OK to re-add it, if I run smartctl tests and they finish successfully?

UPDATE Correct answer is sdf. I found it by comparing set of disks shown in mdadm output and all disks in the system (sda - is boot disk with OS), but I still found such procedure too difficult.

Please show output of command `smartctl -a /dev/sdb`. – Mikhail Khirgiy Mar 28 '17 at 19:37 — Mikhail Khirgiy, Mar 28 '17 at 19:37
@MikhailKhirgiy https://pastebin.com/Z86hBnZA – DimanNe Mar 28 '17 at 19:45 — DimanNe, Mar 28 '17 at 19:45

score 7 · Accepted Answer · answered Mar 28 '17 at 19:38

You can run mdadm --detail /dev/md0 to get the UUID of the RAID array, in your case it's "7dfb32ef:8454e49b:ec03ac98:cdb2e691".

Then run mdadm --examine /dev/sda and check what Array UID it belongs to. If it's the same and sda is missing in the mdadm --detail /dev/md0 output, then it's most likely that disk which was removed.

I can't answer on the reason for mdadm to remove the disk, other than that you should be able to find more information in dmesg and in /var/log.

If those places looks ok, and SMART says the disk is ok, then it should be safe to add it again.

I would recommend you to configure mdadm --monitor so it runs and monitors your RAID sets and emails you if anything happens.

Yes, seems it is the only reliable way to find removed disk – DimanNe Mar 28 '17 at 19:50 — DimanNe, Mar 28 '17 at 19:50

Mikhail Khirgiy · Answer 2 · 2017-03-28T20:06:59.070

1

The sequence of disks in raid array is important. You can see it in output of command mdadm --detail /dev/md0. In your example it is:

sde sdg sdh sdb sdd missed sdc

If a disk has died or unplugged then it still belongs the raid array. In your example the disk was removed from the raid array manually via mdadm command. smartctl is a good program to show disk health. But this program can't help you if you didn't know what disk was in the array.

edited Mar 28 '17 at 20:06

answered Mar 28 '17 at 19:35

Mikhail Khirgiy

2,003
9
7

That's wrong. /proc/mdstat lists 6 active disks, and includes a mentioning that a seventh disk is missing. sdb is clearly listed as active in the `mdadm --detail` command. – Victor Jerlin Mar 28 '17 at 19:42
Yes, you is right – Mikhail Khirgiy Mar 28 '17 at 19:51
I updated my answer – Mikhail Khirgiy Mar 28 '17 at 20:09

How do I determine the failed/removed HDD in mdadm raid?

2 Answers2