raid 1 badly detected as raid 0 when one drive is missing

3

I'm learning raids, so maybe this is some basic question, but it's not covered anywhere...

When I create raid 1, update /etc/mdadm/mdadm.conf as[1], run update-initramfs -u, I can reboot and mount it. Everything is fine. Now I remove one drive, and reboot, to simulate critical failure. raid will be wrongly detected as raid 0 (WHY?), inactive (WHY? because we "just have half of raid0?) and as such cannot be used. What I expected to see was active, degraded drive, not this fatal. What's wrong? See [2] for error state description.

Related question: why mdadm.conf [1] contains devices=/dev/sdb1,/dev/sdc1 if allegedly all partitions (resp. ones defined in DEVICE variable) should be scanned for raid UUID? So why is this part generated? What is its use and why isn't there partition UUID used instead? Could it even be used here?

[1] mdadm.conf

cat /etc/mdadm/mdadm.conf 
# mdadm.conf
#
# !NB! Run update-initramfs -u after updating this file.
# !NB! This will ensure that initramfs has an uptodate copy.
#
# Please refer to mdadm.conf(5) for information about this file.
#

# by default (built-in), scan all partitions (/proc/partitions) and all
# containers for MD superblocks. alternatively, specify devices to scan, using
# wildcards if desired.
#DEVICE partitions containers

# automatically tag new arrays as belonging to the local system
HOMEHOST <system>

# instruct the monitoring daemon where to send mail alerts
MAILADDR alfonz19gmail.com

MAILFROM vboxSystem

# definitions of existing MD arrays

# This configuration was auto-generated on Sun, 10 Feb 2019 09:57:56 +0100 by mkconf
ARRAY /dev/md0 level=raid1 num-devices=2 metadata=1.2 name=mmucha-VirtualBox1:0 UUID=16624299:11ed3af5:3a8acd02:cd24d4d0
   devices=/dev/sdb1,/dev/sdc1
root@mmucha-VirtualBox1:~# cat /etc/mdadm/mdadm.conf 

[2] errorneous state:

root@mmucha-VirtualBox1:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
md0 : inactive sdb1[0](S)
      5236719 blocks super 1.2

unused devices: <none>
root@mmucha-VirtualBox1:~# mdadm --detail /dev/md0
/dev/md0:
           Version : 1.2
        Raid Level : raid0
     Total Devices : 1
       Persistence : Superblock is persistent

             State : inactive
   Working Devices : 1

              Name : mmucha-VirtualBox1:0  (local to host mmucha-VirtualBox1)
              UUID : 16624299:11ed3af5:3a8acd02:cd24d4d0
            Events : 19

    Number   Major   Minor   RaidDevice

       -       8       17        -        /dev/sdb1

UPDATE creation steps

I'd wanted to share something non-interactive, but 'sfdisk' interface and functionality does not work for me; When I ask it to create gpt disklabel type and write, it 'says' it's ok, but did nothing. Ehm. So sorry, you're getting fdisk commands here.

Description: I created 2 new disks for existing VM ubuntu 18.04, set gpt partition table for both, create 1 partition for both, create raid 1, create ext4fs, mount, create some test file, update mdadm.conf, run update-initramfs -u. Reboot, verify, works. Poweroff, remove sde drive, boot. Same failure.

ubuntu release:

lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 18.04.2 LTS
Release:    18.04
Codename:   bionic

fdisk:

fdisk /dev/sdd
g
n
1


t
29
p
w

Prints:

VDisk /dev/sdd: 5 GiB, 5368709120 bytes, 10485760 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: E16A3CCE-1EF7-3D45-8AEF-A70B45B047CC

Device     Start      End  Sectors Size Type
/dev/sdd1   2048 10485726 10483679   5G Linux filesystem

same for /dev/sde:

Disk /dev/sde: 5 GiB, 5368709120 bytes, 10485760 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: AEE480EE-DFA8-C245-8405-658B52C7DC0A

Device     Start      End  Sectors Size Type
/dev/sde1   2048 10485726 10483679   5G Linux filesystem

raid creation:

mdadm --create /dev/md0 --level=mirror --raid-devices=2 /dev/sd[d-e]1

 mdadm --detail /dev/md1
/dev/md1:
           Version : 1.2
     Creation Time : Thu Feb 21 08:54:50 2019
        Raid Level : raid1
        Array Size : 5236672 (4.99 GiB 5.36 GB)
     Used Dev Size : 5236672 (4.99 GiB 5.36 GB)
      Raid Devices : 2
     Total Devices : 2
       Persistence : Superblock is persistent

       Update Time : Thu Feb 21 08:55:16 2019
             State : clean 
    Active Devices : 2
   Working Devices : 2
    Failed Devices : 0
     Spare Devices : 0

Consistency Policy : resync

              Name : mmucha-VirtualBox1:1  (local to host mmucha-VirtualBox1)
              UUID : 1c873dd9:87220378:fc4de07a:99db62ae
            Events : 17

    Number   Major   Minor   RaidDevice State
       0       8       49        0      active sync   /dev/sdd1
       1       8       65        1      active sync   /dev/sde1

formatting and mounting:

mkfs.ext4 /dev/md1 
mkdir /media/raid1
mount /dev/md1 /media/raid1/

mdadm --detail --scan --verbose >> /etc/mdadm.conf

update-initramfs -u

cat /etc/mdadm/mdadm.conf 
# mdadm.conf
#
# !NB! Run update-initramfs -u after updating this file.
# !NB! This will ensure that initramfs has an uptodate copy.
#
# Please refer to mdadm.conf(5) for information about this file.
#

# by default (built-in), scan all partitions (/proc/partitions) and all
# containers for MD superblocks. alternatively, specify devices to scan, using
# wildcards if desired.
#DEVICE partitions containers

# automatically tag new arrays as belonging to the local system
HOMEHOST <system>

# instruct the monitoring daemon where to send mail alerts
MAILADDR alfonz19gmail.com

MAILFROM vboxSystem

# definitions of existing MD arrays

# This configuration was auto-generated on Sun, 10 Feb 2019 09:57:56 +0100 by mkconf
ARRAY /dev/md0 level=raid1 num-devices=2 metadata=1.2 name=mmucha-VirtualBox1:0 UUID=16624299:11ed3af5:3a8acd02:cd24d4d0
   devices=/dev/sdb1,/dev/sdc1
ARRAY /dev/md1 level=raid1 num-devices=2 metadata=1.2 name=mmucha-VirtualBox1:1 UUID=1c873dd9:87220378:fc4de07a:99db62ae
   devices=/dev/sdd1,/dev/sde1

And that's it. As aforementioned, if you remove 1 hdd now, you won't be able to mount the raid:

sudo mdadm --detail /dev/md1
/dev/md1:
           Version : 1.2
        Raid Level : raid0
     Total Devices : 1
       Persistence : Superblock is persistent

             State : inactive
   Working Devices : 1

              Name : mmucha-VirtualBox1:1  (local to host mmucha-VirtualBox1)
              UUID : 1c873dd9:87220378:fc4de07a:99db62ae
            Events : 23

    Number   Major   Minor   RaidDevice

       -       8       49        -        /dev/sdd1

UPDATE 2: I tested the same commands (minus update-initramfs -u) on arch and it worked without hickup. I booted back to ubuntu VM, where I had 2 sets of 2-drive raids of level 1. I removed again one drive, and it worked: clean degraded. I did not even run that VM single time since last time. Ok, so then I removed one drive from another set. So now I should have 2 clean degraded drives on md0 and md1. But I have 2 clean degraded on md0 and md127. But I know that, so I know that I have to stop md127, run mdadm --assemble --scan to get it back on md1, run update-initramfs -u, and after reboot it should be good. But surprisingly it's not. I have md0 and md1 as expected, but each set is missing 1 drive, while 1 is in state clean degraded and other in inactive with bad level. But stopping & reassebling fix that again. All of that happened without single modification of mdadm.conf.

It's deep magic.

Martin Mucha

Posted 2019-02-11T07:32:35.947

Reputation: 271

I'm not sure if it is an actual configuration error, as this is mentioned to happen in arch wiki in section "mounting from live cd", sadly it's not explained at all why this is happening and what does it mean: https://wiki.archlinux.org/index.php/RAID

– Martin Mucha – 2019-02-11T12:37:41.707

The actual configuration looks completely wrong. A raid0 made of just one device doesn't make any sense. Normally I would say this is an entirely different configuration from the one seen in the config file but the matching UUID is intriguing. – Pavel Šimerda – 2019-02-17T19:52:55.113

well technically if you have configured raid 0 and one disk went missing, you will endup in this, which is understandable. The issue however is, why something ignores explicit hint, that this is raid 1 and not 0. So the response should be active,degraded drive – Martin Mucha – 2019-02-17T20:16:38.953

Answers

1

I had the same problem with inactive raid1 array reported as raid0. Not sure why but this fixed it for me

mdadm --stop /dev/md0
mdadm --assemble /dev/md0 --run

EDIT: I didn't try rebooting in degraded state. After starting the array with the above command. I added my backup disk to the array

mdadm --manage /dev/md0 --add /dev/sdX

md0 started re-syncing. After that mine reboots fine.

I imagine in your case you would need to add something to the mdadm.conf file. I would try by adding the level to the ARRAY, e.g.

ARRAY /dev/md0 level=raid1 ...

g.kovatchev

Posted 2019-02-11T07:32:35.947

Reputation: 111

Yes, this works for me as well. But having to invoke this after every reboot, this is really not something you want to do. If you have root on raid it woulndn't be even possible. I have no idea what this is. But it happens to me ONLY in ubuntu. Luckily I'm not forced to use ubuntu on my own machines, so I'm safe, I just have to have work with multiple distros. – Martin Mucha – 2019-03-23T07:26:54.630

I edited my post above, perhaps try adding "level=raid1" to the array definition in /etc/mdadm/mdadm.conf

There are other options that might be helpful, e.g. "num-devices=1" – g.kovatchev – 2019-03-24T16:38:11.773

as you see "level=raid1" is already present in mdadm.conf in my case. I can try num-devices, however I don't know how 'that' could help. a) number of devices is irrelevant to raid level; you can have 10 mirrored drives and 3 striped...b) IIUC devices should be assembled using UUID, thus saying there are 10 drives is pointless IIUC, since all available drives and their count could be calculated. And anyways, my configuration was autogenerated from existing 100% health running raid using mdadm command. Valid configuration should have been produced. – Martin Mucha – 2019-03-25T07:04:28.127

... I'd report that on ubuntu forums, but while I get always & fast response on ARCH, MINT and usually on fedora forums , to these day I have zero responses on any subject on multiple cannocical forums... – Martin Mucha – 2019-03-25T07:06:19.050

-1

You must have done something bad. Try the procedure from the beginning and send all the steps you have done. It must work.

Josef Jebavy

Posted 2019-02-11T07:32:35.947

Reputation: 1

will try, I'll be in touch shortly with exact commands. But to be honest, where is possibility to be wrong, if raid 1 was created, mounted and was working (well allegedly according to mdadm, which gets faulty behavior after drive removal). Anyways, I'll be back with specific commands. – Martin Mucha – 2019-02-18T13:53:56.667

sorry for delay. I added creationg details. Does this show all the details you need? – Martin Mucha – 2019-02-21T09:08:05.397

Hi, dont now where is problem but if you remove RAID you will also remove his configuration from /etc/mdadm.conf – Josef Jebavy – 2019-02-22T12:31:58.390

1I'm not following. I did not remove RAID in virtualized operating system. I did nothing in virtualized system, as we are simulating critical failure; when there is critical failure you do nothing in system, right? So I removed virtual drive from virtul machine in virtual box configuration. Which should equal to someone showed in at your PC and stole 1 your drive. This is what happened. The problem is, that there was no sw change, 1 HDD went missing. This shouldn't affect functionality of level 1 raid. – Martin Mucha – 2019-02-22T13:37:28.473