How to get an inactive RAID device working again?

31

10

After booting, my RAID1 device (/dev/md_d0 *) sometimes goes in some funny state and I cannot mount it.

* Originally I created /dev/md0 but it has somehow changed itself into /dev/md_d0.

# mount /opt
mount: wrong fs type, bad option, bad superblock on /dev/md_d0,
       missing codepage or helper program, or other error
       (could this be the IDE device where you in fact use
       ide-scsi so that sr0 or sda or so is needed?)
       In some cases useful info is found in syslog - try
       dmesg | tail  or so

The RAID device appears to be inactive somehow:

# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] 
                [raid4] [raid10] 
md_d0 : inactive sda4[0](S)
      241095104 blocks

# mdadm --detail /dev/md_d0
mdadm: md device /dev/md_d0 does not appear to be active.

Question is, how to make the device active again (using mdmadm, I presume)?

(Other times it's alright (active) after boot, and I can mount it manually without problems. But it still won't mount automatically even though I have it in /etc/fstab:

/dev/md_d0        /opt           ext4    defaults        0       0

So a bonus question: what should I do to make the RAID device automatically mount at /opt at boot time?)

This is an Ubuntu 9.10 workstation. Background info about my RAID setup in this question.

Edit: My /etc/mdadm/mdadm.conf looks like this. I've never touched this file, at least by hand.

# by default, scan all partitions (/proc/partitions) for MD superblocks.
# alternatively, specify devices to scan, using wildcards if desired.
DEVICE partitions

# auto-create devices with Debian standard permissions
CREATE owner=root group=disk mode=0660 auto=yes

# automatically tag new arrays as belonging to the local system
HOMEHOST <system>

# instruct the monitoring daemon where to send mail alerts
MAILADDR <my mail address>

# definitions of existing MD arrays

# This file was auto-generated on Wed, 27 Jan 2010 17:14:36 +0200

In /proc/partitions the last entry is md_d0 at least now, after reboot, when the device happens to be active again. (I'm not sure if it would be the same when it's inactive.)

Resolution: as Jimmy Hedman suggested, I took the output of mdadm --examine --scan:

ARRAY /dev/md0 level=raid1 num-devices=2 UUID=de8fbd92[...]

and added it in /etc/mdadm/mdadm.conf, which seems to have fixed the main problem. After changing /etc/fstab to use /dev/md0 again (instead of /dev/md_d0), the RAID device also gets automatically mounted!

Jonik

Posted 2010-03-09T09:55:18.747

Reputation: 5 352

Answers

25

For your bonus question:

mdadm --examine --scan >> /etc/mdadm/mdadm.conf

Jimmy Hedman

Posted 2010-03-09T09:55:18.747

Reputation: 886

2Ok, mdadm --examine --scan produced ARRAY /dev/md0 level=raid1 num-devices=2 UUID=... (Note the md0 instead of md_d0!) I put that in the mdadm.conf file (manually, because the was some problem with sudo and >> ("permission denied"), and sudo is required) and also updated fstab to use md0 (not md_d0) again. Now I don't seem to run into the "inactive" problem anymore and the RAID device mounts automatically at /opt upon booting. So thanks! – Jonik – 2010-03-10T14:19:34.940

3The reason you had problems with sudo ... >> mdadm.conf is that the shell opens the redirected files before sudo runs. The command su -c '.... >> mdadm.conf' should work. – Mei – 2013-10-08T18:32:56.197

11

I have found that I have to add the array manually in /etc/mdadm/mdadm.conf in order to make Linux mount it on reboot. Otherwise I get exactly what you have here - md_d1-devices that are inactive etc.

The conf-file should look like below - i.e. one ARRAY-line for each md-device. In my case the new arrays were missing in this file, but if you have them listed this is probably not a fix to your problem.

# definitions of existing MD arrays
ARRAY /dev/md0 level=raid5 num-devices=3 UUID=f10f5f96:106599e0:a2f56e56:f5d3ad6d
ARRAY /dev/md1 level=raid1 num-devices=2 UUID=aa591bbe:bbbec94d:a2f56e56:f5d3ad6d

Add one array per md-device, and add them after the comment included above, or if no such comment exists, at the end of the file. You get the UUIDs by doing sudo mdadm -E --scan:

$ sudo mdadm -E --scan
ARRAY /dev/md0 level=raid5 num-devices=3 UUID=f10f5f96:106599e0:a2f56e56:f5d3ad6d
ARRAY /dev/md1 level=raid1 num-devices=2 UUID=aa591bbe:bbbec94d:a2f56e56:f5d3ad6d

As you can see you can pretty much just copy the output from the scan-result into the file.

I run ubuntu desktop 10.04 LTS, and as far as I remember this behavior differs from the server version of Ubuntu, however it was such a long time ago I created my md-devices on the server I may be wrong. It may also be that I just missed some option.

Anyway, adding the array in the conf-file seems to do the trick. I've run the above raid 1 and raid 5 for years with no problems.

Erik

Posted 2010-03-09T09:55:18.747

Reputation: 111

1So essentially you're saying the same thing as the currently accepted answer, just more verbosely? :) Still, +1, nice first post. – Jonik – 2011-08-01T08:18:17.657

7

Warning: First of all let me say that the below (due to the use of "--force") seems risky to me, and if you have irrecoverable data I'd recommend making copies of the partitions involved before you start trying any of the things below. However, this worked for me.

I had the same problem, with an array showing up as inactive, and nothing I did including the "mdadm --examine --scan >/etc/mdadm.conf", as suggested by others here, helped at all.

In my case, when it tried to start the RAID-5 array after a drive replacement, it was saying that it was dirty (via dmesg):

md/raid:md2: not clean -- starting background reconstruction
md/raid:md2: device sda4 operational as raid disk 0
md/raid:md2: device sdd4 operational as raid disk 3
md/raid:md2: device sdc4 operational as raid disk 2
md/raid:md2: device sde4 operational as raid disk 4
md/raid:md2: allocated 5334kB
md/raid:md2: cannot start dirty degraded array.

Causing it to show up as inactive in /proc/mdstat:

md2 : inactive sda4[0] sdd4[3] sdc4[2] sde4[5]
      3888504544 blocks super 1.2

I did find that all the devices had the same events on them, except for the drive I had replaced (/dev/sdb4):

[root@nfs1 sr]# mdadm -E /dev/sd*4 | grep Event
mdadm: No md superblock detected on /dev/sdb4.
         Events : 8448
         Events : 8448
         Events : 8448
         Events : 8448

However, the array details showed that it had 4 out of 5 devices available:

[root@nfs1 sr]# mdadm --detail /dev/md2
/dev/md2:
[...]
   Raid Devices : 5
  Total Devices : 4
[...]
 Active Devices : 4
Working Devices : 4
[...]
    Number   Major   Minor   RaidDevice State
       0       8        4        0      inactive dirty  /dev/sda4
       2       8       36        2      inactive dirty  /dev/sdc4
       3       8       52        3      inactive dirty  /dev/sdd4
       5       8       68        4      inactive dirty  /dev/sde4

(The above is from memory on the "State" column, I can't find it in my scroll-back buffer).

I was able to resolve this by stopping the array and then re-assembling it:

mdadm --stop /dev/md2
mdadm -A --force /dev/md2 /dev/sd[acde]4

At that point the array was up, running with 4 of the 5 devices, and I was able to add the replacement device and it's rebuilding. I'm able to access the file-system without any problem.

Sean Reifschneider

Posted 2010-03-09T09:55:18.747

Reputation: 1 387

5

I was having issues with Ubuntu 10.04 where an error in FStab prevented the server from booting.

I ran this command as mentioned in the above solutions:

mdadm --examine --scan >> /etc/mdadm/mdadm.conf

This will append the results from "mdadm --examine --scan" to "/etc/mdadm/mdadm.conf"

In my case, this was:

ARRAY /dev/md/0 metadata=1.2 UUID=2660925e:6d2c43a7:4b95519e:b6d110e7 name=localhost:0

This is a fakeraid 0. My command in /etc/fstab for automatically mounting is:

/dev/md0 /home/shared/BigDrive ext3 defaults,nobootwait,nofail 0 0

The important thing here is that you have "nobootwait" and "nofail". Nobootwait will skip any system messages which are preventing you from booting. In my case, this was on a remote server so it was essential.

Hope this will help some people.

Nick Woodhams

Posted 2010-03-09T09:55:18.747

Reputation: 201

This is what did it for me. I have my RAID drives attached via a PCI express SATA card, so I'm guessing at boot time the system couldn't see those drives yet. – Michael Robinson – 2015-02-23T16:44:13.220

2

You can activate your md device with

mdadm -A /dev/md_d0

I suppose some startup script starts too soon, before one of the RAID member was discovered or some similar problem. As a quick and dirty workaround, you should be able to add this line to /etc/rc.local :

mdadm -A /dev/md_d0 && mount /dev/md_d0

Edit : apparently your /etc/mdadm/mdadm.conf still contains the old configuration name. Edit this file and replace occurences of md0 with md_d0.

wazoox

Posted 2010-03-09T09:55:18.747

Reputation: 1 285

Ok, on those occasions when the device is active after reboot, just mount /dev/md_d0 in /etc/rc.local works fine. mdadm -A /dev/md_d0 on the other hand fails with that error message in both cases (so I couldn't use it before that && operator). Anyway, half of the problem seems solved so +1 for that. – Jonik – 2010-03-09T15:14:19.000

Actually mdadm.conf doesn't contain any configuration name, at least directly (it does refer to /proc/partitions though); see the edited question. I've never touched mdadm.conf - what is the tool that autogenerates it? – Jonik – 2010-03-10T12:36:31.913

For the record, removed the /etc/rc.local workaround as it seems I got everything working properly: http://superuser.com/questions/117824/how-to-get-an-inactive-raid-device-working-again/118251#118251 :)

– Jonik – 2010-03-10T14:29:14.003

2

A simple way to get the array to run assuming there is no hardware problem and you have enough drives/partitions to start the array is the following:

md20 : inactive sdf1[2](S)
      732442488 blocks super 1.2

 sudo mdadm --manage /dev/md20  --run

It could be that for whatever reason the array is fine but something prevented it from starting or building. In my case this was because mdadm didn't know the original array name was md127 and all drives were unplugged for that array. When replugging I had to manually assemble (probably a bug where mdadm thought the array was already active because of the offline old array name).

Areeb Soo Yasir

Posted 2010-03-09T09:55:18.747

Reputation: 175

2

md_d0 : inactive sda4[0](S) looks wrong for a RAID1 array. It seems to suggest that the array has no active devices and one spare device (indicated by the (S), you would see (F) there for a failed device and nothing for an OK/active device) - for a RAID1 array that isn't running degraded there should be at least two OK/active devices (and for a degraded array, at least one OK/active device) and you can't activate a RAID1 array with no none-failed not-spare devices (as spares do not contain a copy of the data until they are made active when another drive fails). If I'm reading that /proc/mdstat output right, you'll not be able to activate the array in its current state.

Do you have any physical drives in the machine that have failed to spin-up? Does ls /dev/sd* list all the drives and partitions that you would normally expect to see on that machine?

David Spillett

Posted 2010-03-09T09:55:18.747

Reputation: 22 424

Seems I cannot reproduce the inactive situation any more, after following the advice in Jimmy's answer (seems like that anyway after a few reboots)... Which is nice :) Thanks in any case! – Jonik – 2010-03-10T14:12:12.473

I brought the question of this state to the Linux RAID mailing list, and got this response: https://www.spinics.net/lists/raid/msg61352.html

– nh2 – 2018-11-14T15:06:29.817

As I just wrote here, echo active > /sys/block/md0/md/array_state worked for me, bringing making my RAID show up as RAID1 with missing disk again instead of RAID0 with spare-only.

– nh2 – 2018-11-14T15:15:05.470

2

I had a similar problem... my server would not mount md2 after I had grown theassociated devices partitions. On reading this thread I found that the md2 RAID device had a new UUID and the machine was trying to use the old one.

As suggested... using 'md2' output from

mdadm --examine --scan

I edited /etc/mdadm/mdadm.conf and replaced the old UUID line with the one output from above command and my problem went away.

Peter Errity

Posted 2010-03-09T09:55:18.747

Reputation: 21

2

When you pretend to do something with /dev/md[012346789} it goes to /dev/md{126,127...}. /dev/md0 continues mounted at /dev/md126 or /dev/md127 you have to:

umount /dev/md127 or umount /dev/md126.

This is temporary to let you execute commands and some applications without stopping your system.

Vanderj68

Posted 2010-03-09T09:55:18.747

Reputation: 21