Intel Matrix Storage Manager vs Linux Software RAID

28

18

The chipset I'm using supports the Intel RSTe technology. It means that I have two options for RAID setup:

  • Ordinary linux software RAID, using mdadm command.
  • RSTe (either by the BIOS or, again, using mdadm command with -e imsm switch).

Given that mdadm can be used for both, I can't understand the difference between the two.

  • What does RSTe give to me as compared to regular Linux software RAID?
  • When in RSTe mode, is the actual RAID I/O path (i.e. mirroring and striping) handled by the Linux md or by the BIOS.
  • In particular, when I use "matrix RAID" (i.e. the RAID covers specific partitions rather then the whole disks), do I need to manually install grub on both MBRs?

Leonid99

Posted 2012-08-14T12:42:42.213

Reputation: 383

Answers

46

Overview

There are three common types of RAID available:

  • Software RAID: This means that your BIOS and other operating systems think that you really have two separate disks, but purely at the software level, your operating system uses some OS-specific on-disk format for RAID (mirroring, striping, parity bits, whatever). All processing is done by the CPU with no hardware support.

  • BIOS RAID: Also known as "Fake RAID" or "Host RAID", this means that your motherboard firmware (more specifically, your SATA/SAS controller) has explicit support for recognizing RAID devices. At the logical device level (lun), your multiple hard drives will appear as a single drive to the operating system. This is basically the SATA/SAS controller saying "I really only have one hard drive. Well, it's actually two, but shhhh, it's just one, trust me". In other words the operating system can tell that it's a RAID setup, but the operating system *is not responsible for the on-disk format of the RAID parity/striping/etc. However, even in this mode, the CPU does all the calculations for parity bits and striping. The motherboard, BIOS, and SATA controller have just enough logic to physically "combine" the devices and define an on-disk format for the RAID. But they lack a dedicated processor to do the calculations, and depend on software within the operating system to tell the CPU to do them, which is why you still have to tell Linux about your BIOS RAID. (Intel Matrix / RST is a type of BIOS RAID).

  • Hardware RAID: This is where you have a dedicated chip whose sole purpose is to process the data required for RAID. The chip can be quite powerful; some hardware RAID controllers actually have a dual core, CPU-like chip on board, although it is specifically optimized to run an embedded operating system that is VERY fast at doing RAID calculations, such as parity bits for RAID-5, or striping for RAID-0. The hard disks are physically cabled into the RAID card, which provides a SATA/SAS controller, usually a read and write cache in DRAM or Flash, native command queuing, and a central processor on-board that does the more mathematical calculations. These hardware chips run between $150 at the entry level up to many thousands for industrial datacenter RAID backplanes.

Compatibility

In general, each type of RAID is "tied" to some particular aspect that, when that aspect changes, you run into compatibility problems.

  • Software RAID is tied to the operating system that defined the RAID format. Sometimes, between two different versions of the same operating system, the RAID format will be broken, leading to incompatibility. Although it is conceptually possible for any software RAID format to be supported by any other operating system, since it's just software, in practice, most operating systems present incompatible RAID formats that only that operating system can recognize. However, the widest-known compatibility is the RAID formats natively used by the Linux kernel (md as you are discussing in the OP), which can also recognize the software RAID of Windows, called Dynamic Disks.

  • BIOS RAID is tied to the motherboard you own. It may be possible to move drives formatted with a particular BIOS RAID format to another motherboard with a similar BIOS RAID solution; for example, Intel RST to another system with RST. But you will need to carefully research this before you make a move, to make certain that it will be compatible if you care about it being compatible.

  • Hardware RAID is tied to that specific hardware controller, or a series of hardware controllers that are explicitly stated by the manufacturer to be compatible. Some vendors maintain a very consistent hardware RAID disk format that is supported by many generations of controllers; others change up the format more frequently. Again, you will have to research it on a case-by-case basis.

Performance

Performance largely depends on how you configure the basic parameters of the RAID array, and less on the specific solution. In general, Hardware RAID controllers have the highest "ceiling" for the maximum performance; they also don't tax your CPU nearly as much as the other solutions. But if you choose the wrong RAID type for your workload, or the wrong stripe size, or the wrong caching approach, a Hardware RAID controller can also be extremely slow, slower than one of the drives running in non-RAID mode. Same goes for the other solutions, which can also be extremely slow.

  • Software RAID is most suitable for RAID-1 configuration, since mirroring is a simple copy of the same data to two drives, and there are no parity bits to calculate. RAID-5 on Software RAID is horrible.

  • BIOS RAID performance is generally comparable to Software RAID, but certain BIOS RAID controllers and disk formats have been known to be buggy or poor performers. In general, if you have to choose between Software RAID and BIOS RAID, the former is a bit more promising for performance, especially if you are running a recent Linux distribution.

  • Hardware RAID performance can be insanely fast due to the optimized processing power of the RAID controller's processor, which as I said is designed for high throughput and can actually come as a multi-core chip -- so this is some serious iron. The main downside is that you lose flexibility -- you can't just slot the drives into another computer without a Hardware RAID controller -- and expense. Hardware RAID is the best level on which to use RAID-5 or RAID-6, especially if you have a lot of disks (4 or more).

Overall

Although BIOS RAID is supported by Linux, I can't recommend that you use it.

Now to directly answer your questions, after I've given you the long-winded answer:

What does RSTe give to me as compared to regular Linux software RAID?

See the comparisons above between software RAID and BIOS RAID. "RSTe" is an instance of BIOS RAID; Linux md RAID without the -e imsm is an instance of software RAID.

When in RSTe mode, is the actual RAID I/O path (i.e. mirroring and striping) handled by the Linux md or by the BIOS.

If you mean the data path, it is always handled by the CPU (and thus, the operating system) unless you have a dedicated hardware RAID card. I don't think these come on any motherboards, although some high-end server chipset out there might surprise me...

In particular, when I use "matrix RAID" (i.e. the RAID covers specific partitions rather then the whole disks), do I need to manually install grub on both MBRs?

No. In fact, you never need to install GRUB on both MBRs. Let's take it case by case:

  • Software RAID: Just pick one disk arbitrarily to install GRUB on, and set it in the BIOS order so that it boots first. Remember, you can mirror individual partitions if you want, so the disks don't have to be bit for bit identical in software RAID. One can have an MBR with a bootloader and one can have nothing in the MBR.

  • BIOS RAID: The BIOS will tell you that it's one "disk" (it'll actually call it what it is, a RAID array), so you don't get to choose where to install GRUB. When you install Linux to this, the MBR (including the bootloader) and every other sector of both disks will be copied between the two disks. So unlike software RAID, BIOS RAID does enforce that both disks have to be block for block identical, because you can't separate them out as two logical devices; the disk controller says they are ONE logical device, not two. So you can't just say "I want to write some data to drive 0 but not to drive 1". Not possible. But it's entirely possible with Software RAID.

  • Hardware RAID: The BIOS will tell you that it's one "disk", and as far as the BIOS is concerned, it isn't even particularly aware that you're dealing with multiple disks. The RAID controller completely abstracts away all details of the RAID from the operating system and BIOS, except to the extent that you can configure some hardware RAID controllers using some kind of custom protocol within the operating system. But the devices are completely inseparable from the software layer, similar to BIOS RAID.

Edit: Updating for more question answers

I still fail to understand a couple of things. Firstly, about a BIOS RAID: I can build it using mdadm, so Linux doesn't actually hide the underlying disks from me.

It's weird and hard to explain. Basically the disks appear as one at certain layers, and as two at other layers. But I'm betting that with BIOS RAID that each disk won't have its own separate device node, e.g. /dev/sda and /dev/sdb. If it does, well, your BIOS RAID is different than what I've seen.

about grub and MBRs: if a RAID covers partitions rather than disks, then I can still see the underlying disks. The MBR is not under RAID and thus you need to install the bootloader twice to be able to boot in case of a disk failure. Is this correct?

It doesn't hurt to install another copy, but in cases of a disk failure, booting is going to be the least of your concern. In brief, go ahead and do it if you want to, but it's hardly the most important thing. Installing grub from a live CD onto a HDD is easy.

Disks in RAID (especially if they are the same make and model, produced at the same factory, and in operation right next to each other at the same temperature) are likely to fail in quick succession, one right after the other. So if a disk did fail, it's probably not OK to just shrug and put in a new disk and start the rebuild: there's a fair chance that, during the rebuild, the last disk containing a consistent copy of the data will itself fail. It's at this point when you get down to the last remaining disk that I would recommend having an expert (or do it yourself if you're good at hardware) remove the platters from the original disk, buy a new disk of identical make/model, put the platters in there and read off the data using the new disk. This is expensive and time-consuming, but is the most fail-proof way of retaining your data.

So that's five questions I've answered for you; if you found any value in this information, please mark the answer appropriately. Thanks.

allquixotic

Posted 2012-08-14T12:42:42.213

Reputation: 32 256

1I have a 6-series PCH, and mdadm --detail-platform indicates that Intel Matrix Storage Manager is version 11.0.0.1339. I can in fact see my individual drives as /dev/sdc and /dev/sdd. My fake RAID1 array is then /dev/md126. – Jonathon Reinhart – 2015-01-04T01:11:56.670

1When the Linux kernel is up and running, firmware/BIOS code is no longer executing. So what all is the kernel responsible for doing? Everything? If that's the case, then what does IMSM actually do? It seems that it's not much more than Linux does with software RAID by itself? I guess I'm asking, what exactly does the BIOS / Chipset do? – Jonathon Reinhart – 2015-01-04T01:14:24.720

Well it appears to be a hybrid: When the computer is first booting (when the BIOS is still active), BIOS provides the soft RAID logic. When the OS boots, its drivers are then responsible for maintaining the RAID array. 1, 2

– Jonathon Reinhart – 2015-01-04T01:23:33.940

1thanks for your detailed and insightful answer. I still fail to understand a couple of things. Firstly, about a BIOS RAID: I can build it using mdadm, so Linux doesn't actually hide the underlying disks from me. Secondly, about grub and MBRs: if a RAID covers partitions rather than disks, then I can still see the underlying disks. The MBR is not under RAID and thus you need to install the bootloader twice to be able to boot in case of a disk failure. Is this correct? – Leonid99 – 2012-08-14T14:54:11.397

2

Allquixotic’s answer is too long:

  1. What does RSTe give to me as compared to regular Linux software RAID?

Boot support and a slightly different feature set. At its heart it is a data format. – You could even use it without Intel’s RST Option ROM (then you have no special boot support). What the format implies is written in the mdadm man page.

  1. When in RSTe mode, is the actual RAID I/O path (i.e. mirroring and striping) handled by the Linux md or by the BIOS.

By Linux md (i.e. the kernel entirely).

This leaves one question open: Why is Intel’s RST limited to some chipsets only? They don’t participate in the RAID at all. At best they store a bit, which tells the Option ROM to refuse running on unsupported chipsets.

Robert Siemer

Posted 2012-08-14T12:42:42.213

Reputation: 358

1This is much better than the accepted answer, which is long, rambling, largely incorrect, and does not answer the question. (The key point is that RSTe uses a format understood by the BIOS, allowing you to boot from a degraded RAID.) – Nemo – 2019-01-22T00:01:41.113

1

Hmmm one answer too long the other too short.

RST "raid" is mainly for use if you are dual-booting a workstation since Intel produces windows and Linux drivers and you can configure the raid in BIOS. You configure RAID, partition the virtual disk and can dual boot with both OS'es understanding the multiple partitions.

mdam is for if the server is dedicated to Linux. It is "better" since if you are rebuilding an array you are doing it from the OS not from BIOS so the rebuild speed is much faster. With large disks a BIOS raid rebuild can take several DAYS.

But the reality is you are choosing between a green Piece of C or a blue Piece of C. The reality is that software RAID is basically "last ditch CYA" raid.

If you lose a disk in a software RAID array essentially this allows you to immediately stop the server, make a complete backup, replace the failed disk and maybe the rest of the disks, then either recreate the array or attempt to rebuild. Quite often it is faster to replace the disk then erase everything on them and recreate the array and then boot from a backup restore disk then restore from backup.

With a hardware array chip all the disks can go into hot-swap trays and when one fails, a red light turns on, on the failed disk, you eject it, replace it with a new disk, then the hardware raid card automatically rebuilds the array while the server is still running.

While in theory it is possible to do this if you have hot swap trays and a Linux mdam software array in practice you are risking a panic and the server can easily fail to boot on the remaining disk.

The other issue concerns the TYPE of disks used. Regular workstation disks as they age start to develop bad sectors which are internally remapped by the disk into spare sectors. The problem is that this remap only happens on a write - the disk will delay remapping if a read occurs on a bad sector and on some disk models will repeatedly re-read the bad or failing sector, comparing the result each time until it decides it has the best data it can get from that sector before remapping it. This process can take a minute or so and during that time you now have 1 disk in the array ignoring command codes so the Software raid software will crash and mark the array as degraded. On reboot you now have 2 disks with the same sector which might have different data between disks, so now the software RAID manager does not know which is "good", the disk that didn't have an error or the disk that remapped a sector with the best approximation of the data that it had. Western Digital makes "Red" drives that are supposed to be used in software RAID arrays that do not do this, they just fail a sector read immediately when they detect a bad sector and remap it so the array manger can take the data from the sector on the good drive and write it to the drive with the failed sector. Needless to say they charge extra for these disks.

In summary, do not use software raid for a server that cannot tolerate some downtime if a disk fails. It is mainly intended for workstations where people don't regularly backup, and for small SOHO servers that are backed up and can tolerate a day or so of downtime if a disk crashes.

Ted Mittelstaedt

Posted 2012-08-14T12:42:42.213

Reputation: 19