6

My current mdstat:

$ cat /proc/mdstat 
Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10] 
md0 : active raid6 sde[8] sdh[4] sdg[1] sdd[6] sdb[5] sdc[7]
      9766914560 blocks super 1.2 level 6, 512k chunk, algorithm 2 [7/6] [UUUUU_U]

unused devices: <none>

Here is mdadm --detail:

$ sudo mdadm --detail /dev/md0
/dev/md0:
        Version : 1.2
  Creation Time : Fri Apr 26 21:52:21 2013
     Raid Level : raid6
     Array Size : 9766914560 (9314.46 GiB 10001.32 GB)
  Used Dev Size : 1953382912 (1862.89 GiB 2000.26 GB)
   Raid Devices : 7
  Total Devices : 6
    Persistence : Superblock is persistent

    Update Time : Tue Mar 28 15:19:34 2017
          State : clean, degraded 
 Active Devices : 6
Working Devices : 6
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 512K

           Name : server:0  (local to host server)
           UUID : 7dfb32ef:8454e49b:ec03ac98:cdb2e691
         Events : 34230

    Number   Major   Minor   RaidDevice State
       8       8       64        0      active sync   /dev/sde
       1       8       96        1      active sync   /dev/sdg
       4       8      112        2      active sync   /dev/sdh
       5       8       16        3      active sync   /dev/sdb
       6       8       48        4      active sync   /dev/sdd
      10       0        0       10      removed
       7       8       32        6      active sync   /dev/sdc

My questions are:

  1. How am I supposed to figure out the removed HDD? Without tricks and guesses like subtracting the set of disks shown in mdadm output from all available HDDs in my system (ls /dev/sd*), etc....
  2. Why mdadm could remove the disk? Is it OK to re-add it, if I run smartctl tests and they finish successfully?

UPDATE Correct answer is sdf. I found it by comparing set of disks shown in mdadm output and all disks in the system (sda - is boot disk with OS), but I still found such procedure too difficult.

DimanNe
  • 161
  • 1
  • 6

2 Answers2

7

You can run mdadm --detail /dev/md0 to get the UUID of the RAID array, in your case it's "7dfb32ef:8454e49b:ec03ac98:cdb2e691".

Then run mdadm --examine /dev/sda and check what Array UID it belongs to. If it's the same and sda is missing in the mdadm --detail /dev/md0 output, then it's most likely that disk which was removed.

I can't answer on the reason for mdadm to remove the disk, other than that you should be able to find more information in dmesg and in /var/log.

If those places looks ok, and SMART says the disk is ok, then it should be safe to add it again.

I would recommend you to configure mdadm --monitor so it runs and monitors your RAID sets and emails you if anything happens.

Victor Jerlin
  • 508
  • 4
  • 6
1

The sequence of disks in raid array is important. You can see it in output of command mdadm --detail /dev/md0. In your example it is:

sde sdg sdh sdb sdd missed sdc

If a disk has died or unplugged then it still belongs​ the raid array. In your example the disk was removed from the raid array manually via mdadm command​. smartctl is a good program to show disk health. But this program can't help you if you didn't know what disk was in the array.

Mikhail Khirgiy
  • 2,003
  • 9
  • 7