How to change boot drive on server with RAID 5 software raid

Question

I have my hardware-owned dedicated server hosted in remote datacenter, running CentOS release 6.4 (Final) with grub (GNU GRUB 0.97). Server has 6 2TB drives with 2 software raids on them - md0 raid 1 for system and swap and md1 raid 5 for data. Here is fdisk and mdstat output:

[root@s3 ~]# fdisk -l

Disk /dev/sda: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x0004f1ce

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1         653     5242880   fd  Linux raid autodetect
Partition 1 does not end on cylinder boundary.
/dev/sda2             653      243202  1948270592   fd  Linux raid autodetect

Disk /dev/sdb: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x0008177f

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *           1         653     5242880   fd  Linux raid autodetect
Partition 1 does not end on cylinder boundary.
/dev/sdb2             653      243202  1948270592   fd  Linux raid autodetect

Disk /dev/sdc: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x0008afb7

   Device Boot      Start         End      Blocks   Id  System
/dev/sdc1   *           1         653     5242880   fd  Linux raid autodetect
Partition 1 does not end on cylinder boundary.
/dev/sdc2             653      243202  1948270592   fd  Linux raid autodetect

Disk /dev/sde: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x00012b28

   Device Boot      Start         End      Blocks   Id  System
/dev/sde1   *           1         653     5242880   fd  Linux raid autodetect
Partition 1 does not end on cylinder boundary.
/dev/sde2             653      243202  1948270592   fd  Linux raid autodetect

Disk /dev/sdd: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x000df271

   Device Boot      Start         End      Blocks   Id  System
/dev/sdd1   *           1         653     5242880   fd  Linux raid autodetect
Partition 1 does not end on cylinder boundary.
/dev/sdd2             653      243202  1948270592   fd  Linux raid autodetect

Disk /dev/sdf: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x000b6b9e

   Device Boot      Start         End      Blocks   Id  System
/dev/sdf1   *           1         653     5242880   fd  Linux raid autodetect
Partition 1 does not end on cylinder boundary.
/dev/sdf2             653      243202  1948270592   fd  Linux raid autodetect

Disk /dev/md1: 9974.5 GB, 9974471720960 bytes
2 heads, 4 sectors/track, -1859793536 cylinders
Units = cylinders of 8 * 512 = 4096 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 524288 bytes / 2621440 bytes
Disk identifier: 0x00000000


Disk /dev/md0: 5368 MB, 5368643584 bytes
2 heads, 4 sectors/track, 1310704 cylinders
Units = cylinders of 8 * 512 = 4096 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x00000000


Disk /dev/mapper/vg_s3-LogVol01: 12.6 GB, 12582912000 bytes
255 heads, 63 sectors/track, 1529 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 524288 bytes / 2621440 bytes
Disk identifier: 0x00000000


Disk /dev/mapper/vg_s3-LogVol00: 3271 MB, 3271557120 bytes
255 heads, 63 sectors/track, 397 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 524288 bytes / 2621440 bytes
Disk identifier: 0x00000000


Disk /dev/mapper/vg_s3-LogVol02: 9958.6 GB, 9958615678976 bytes
255 heads, 63 sectors/track, 1210732 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 524288 bytes / 2621440 bytes
Disk identifier: 0x00000000

[root@s3 ~]# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [raid1]
md0 : active raid1 sdc1[2] sda1[0] sdd1[3] sdb1[1] sde1[4] sdf1[5]
      5242816 blocks super 1.0 [6/6] [UUUUUU]
      bitmap: 1/1 pages [4KB], 65536KB chunk

    md1 : active raid5 sda2[0](F) sdd2[3] sdc2[2] sdb2[1] sde2[4] sdf2[6]
          9740695040 blocks super 1.1 level 5, 512k chunk, algorithm 2 [6/5] [_UUUUU]
          bitmap: 15/15 pages [60KB], 65536KB chunk

    unused devices: <none>

As you can see, the array md1 is in degraded state, /dev/sda drive is faulty. I have exchanged many faulty drives before in other servers, but here /dev/sda is also boot drive (probably). Here is my boot grub device map:

# this device map was generated by anaconda
(hd0)     /dev/sdb
(hd1)     /dev/sdc
(hd2)     /dev/sdd
(hd3)     /dev/sde
(hd4)     /dev/sdf
(hd5)     /dev/sdg

/dev/sda is not listed there for some reason and it lists /dev/sdg, which according to fdisk does not exist. Here is /boot/grub/grub.conf:

# grub.conf generated by anaconda
#
# Note that you do not have to rerun grub after making changes to this file
# NOTICE:  You do not have a /boot partition.  This means that
#          all kernel and initrd paths are relative to /, eg.
#          root (hd0,0)
#          kernel /boot/vmlinuz-version ro root=/dev/md0
#          initrd /boot/initrd-[generic-]version.img
#boot=/dev/sdb
default=0
timeout=5
splashimage=(hd0,0)/boot/grub/splash.xpm.gz
hiddenmenu
title CentOS (2.6.32-358.el6.x86_64)
    root (hd0,0)
    kernel /boot/vmlinuz-2.6.32-358.el6.x86_64 ro root=UUID=cf9ba269-255e-4650-a095-87f2cdc5e22e rd_NO_LUKS rd_LVM_LV=vg_s3/LogVol01 LANG=en_US.UTF-8 rd_MD_UUID=596e4bea:c4494ac6:b2007529:1c8053a7 SYSFONT=latarcyrheb-sun16 crashkernel=auto rd_MD_UUID=40df07b4:85b88119:f11dabdf:97836f34  KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM rhgb quiet
    initrd /boot/initramfs-2.6.32-358.el6.x86_64.img

It lists hd(0,0) as boot drive. I think this is dev/sda drive, which is faulty. So if I turned now the server off to change the drive, it would not boot again. I am trying to switch boot drive to some other drive, I have done a lot of searching online, but I am unable to figure it out. I tried these commands:

[root@s3 ~]# grub-install /dev/sdb
/dev/sda1 does not have any corresponding BIOS drive.
[root@s3 ~]# grub
Probing devices to guess BIOS drives. This may take a long time.


    GNU GRUB  version 0.97  (640K lower / 3072K upper memory)

 [ Minimal BASH-like line editing is supported.  For the first word, TAB
   lists possible command completions.  Anywhere else TAB lists the possible
   completions of a device/filename.]
grub> find /grub/stage1
find /grub/stage1

Error 15: File not found
grub> find /boot/grub/stage1
find /boot/grub/stage1
 (hd0,0)
 (hd1,0)
 (hd2,0)
 (hd3,0)
 (hd4,0)
 (hd5,0)
grub> cat (hd0,0)/grub/grub.conf
cat (hd0,0)/grub/grub.conf

Error 15: File not found
grub> quit
quit

What changes should I make to grub to be able to boot after I swap /dev/sda. HDD boot order will have to be probably modified also in BIOS.

score 0 · Answer 1 · edited Apr 13 '17 at 12:37

0

When using software RAID, the selected boot drive depends entirely on the BIOS (and on how it enumerate drives).

To have a bootable machine, simply replace the failed drive and use grub-install to install the bootloader on the BIOS boot drive or, even better, to all the drive partecipating in the RAID1 array.

Please see here for more information.

edited Apr 13 '17 at 12:37

Community

1

answered Nov 19 '16 at 18:28

shodanshok

44,038
6
98
162

Thanks for your help. I looked at the link and they are using `grub-install` command aswell. When I try to use `grub-install` like `[root@s3 ~]# grub-install /dev/sdb` or `[root@s3 ~]# grub-install /dev/sdc`, it gives this error: `/dev/sda1 does not have any corresponding BIOS drive.` – Josef Nov 19 '16 at 19:13
Have you rebooted the machine after installing the new drive? Does the BIOS recognize it? – shodanshok Nov 19 '16 at 19:24
No, I have not changed the drive yet. I am trying to prepare for the change. I wanted to shutdown the server for the change as I am not 100 % sure that disk marked as /dev/sda is really /dev/sda. If the server was off, I would check it by disk serial number. If I swapped bad drive while server is on, raid 5 would break. So I wanted to make sure that the other drives are bootable before I change the drive. But now I see that I will have to risk it and change the drive while server is running. – Josef Nov 19 '16 at 20:47
Rather then pulling a random drive, try to identify the right disk correlating information you can obtain from `dmesg`, `mdamd -E` and `smartctl` – shodanshok Nov 19 '16 at 22:04
I know serial number of the drive from `hdparm`. Problem is that I will be able to confirm the serial number only after I pull the disk out. I put stickers on drives with sda, sdb etc. labels on drives when I first installed the server, I just hope that I did it correctly back then. If the server was off, I would be sure that I changed the right drive, but it would be risk that it would not boot again. – Josef Nov 20 '16 at 00:05
From your comment above, you run `grub-install` on both `sdb` and `sdc`, so you should have the boot code on these two drives. I would pull the drive *after* a server shutdown, rather than risk a full server crash due to removing of a wrong drive. Anyway, searching in the output of `dmesg` and looking inside `/sys/`, you should be able to identify which drive is connected to which SATA channel. Give a look [here](http://serverfault.com/questions/244944/linux-ata-errors-translating-to-a-device-name) for more information. – shodanshok Nov 20 '16 at 06:50

How to change boot drive on server with RAID 5 software raid

1 Answers1