2

I have a relatively old server machine which has no SATA ports on the motherboard (Dell Precision Workstaion 650). Also have a cheap FakeRAID controller (VIA VT6421 Chipset) and two Seagate 3TB drives (ST3000DM001) connected to the controller. The controller was configured to use drives in Stripe mode (not RAID): I basically used it as a SATA controller, rather than a RAID controller. I was able to setup a software (mdadm) RAID on this system instead. Each of the two physical drives is partitioned as follows:

Number  Start (sector)    End (sector)  Size        Code  Name
1                2048            4095   1024.0 KiB  EF02  BIOS boot partition
2                4096         3186687   1.5 GiB     EF00  EFI System
3             3186688      5856337919   2.7 TiB     FD00  Linux RAID
4          5856337920      5860532223   2.0 GiB     8200  Linux swap

So there are:

  1. 1MiB of unallocated free space in the beginning of each drive as required by GRUB for GPT drives on legacy BIOS
  2. Partition #1 (1MiB): BIOS boot partition for GRUB
  3. Partition #2 (1.5GiB) for /boot => md0
  4. Partition #3 (2.7TiB) for / => md1
  5. Partition #4 (2GiB) for swap => md127

Partitions #2, #3, and #4 are assembled into software RAIDs md0, md1 and md127 (using raid1):

$ cat /proc/mdstat 
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] 
md0 : active raid1 sdb2[2] sda2[3]
    1590208 blocks super 1.2 [2/2] [UU]

md1 : active raid1 sdb3[2] sda3[3]
    2926444352 blocks super 1.2 [2/2] [UU]

md127 : active raid1 sdb4[1] sda4[0]
    2097088 blocks [2/2] [UU]

Grub (verison 1.99) is individually installed on each of the two physical drives.

This setup is able to boot and has been more or less successfully working (Ubuntu Server 12.04 i686) for the last two years, except for a couple of HDD failures (I think mostly because of overheating, as the cooling was not very good). Two times a failed drive was removed, and a new drive was successfully added to and synchronized in the array.

Now I am replacing the HDD controller, the cheap FakeRAID (old_controller) with Supermicro SAT2-MV8 8 Port SATA HBA PCI-X (new_controller) which is a pure SATA controller, not FakeRAID, because the old controller connectors are really flimsy, and seem to cause interface errors sometimes.

The PROBLEM now is that I cannot boot the system without the old_controller. In particular, if none of the drives is connected to the old_controller, then it always ends up in "no such disk" error and grub rescue command line. Then, in the rescue mode, "ls"-command does not list (md/0) while the physical drive partitions are displayed. If, however, one of the drives is connected to the old_controller the system boots normally, and if in this case the grub is interrupted "ls"-command shows (md/0) along with other drive partitions. I tried several configurations.

Configurations that boot (at least reaches the GRUB linux image selection menu):

  1. disk1 + disk2 on old_controller
  2. disk1 on old_controller + disk2 on new_controller
  3. disk2 on old_controller + disk1 on new_controller
  4. disk1 on old_controller
  5. disk2 on old_controller

GRUB's "ls"-command in these cases always lists (md/0)

Configurations that FAIL to boot ("no such disk" error and grub rescue> command line):

  1. disk1 and disk2 on new_controller
  2. disk1 on new_controller
  3. disk2 on new_controller
  4. disk1 and disk2 on new_controller with old_controller completely removed from the motherboard
  5. disk1 on new_controller with old_controller completely removed from the motherboard
  6. disk2 on new_controller with old_controller completely removed from the motherboard

GRUB's "ls"-command in these cases DOES NOT lists (md/0)

Why GRUB cannot assemble md array when neither of the drives is connected to old_controller? Overall, could you please help me to make the system work without the old_controller?

Could you recommend a good place to read in depth about how the software RAID gets assembled on boot. Especially with GRUB? How are the drives for for md array are identified by GRUB? How can a replaced SATA controller affect this process?

Thank you!

sd1074
  • 41
  • 4

1 Answers1

1

I will answer my own question.

In short: upgrading from GRUB1.99 to GRUB 2.02 beta2 SOLVED the problem.

Somehow GRUB's module mdraid1x from GRUB1.99 could not correctly detect RAID when the drives were connected to the new controller. I am not sure what is the exact difference between the SATA controllers. One thing however is that even though neither of them could detect the correct 3TB size of the drives, their reported values were diffrent: 800MB on old_controller, and 2TB on new_controller. Maybe somehow it affected the ability of mdraid1x to detect the RAID partitions.

Nevertheless, this issue seems to have been fixed in GRUB 2.02 beta2.

sd1074
  • 41
  • 4