29

In the case of multiple layers (physical drives -> md -> dm -> lvm), how do the schedulers, readahead settings, and other disk settings interact?

Imagine you have several disks (/dev/sda - /dev/sdd) all part of a software RAID device (/dev/md0) created with mdadm. Each device (including physical disks and /dev/md0) has its own setting for IO scheduler (changed like so) and readahead (changed using blockdev). When you throw in things like dm (crypto) and LVM you add even more layers with their own settings.

For example, if the physical device has a read ahead of 128 blocks and the RAID has a readahead of 64 blocks, which is honored when I do a read from /dev/md0? Does the md driver attempt a 64 block read which the physical device driver then translates to a read of 128 blocks? Or does the RAID readahead "pass-through" to the underlying device, resulting in a 64 block read?

The same kind of question holds for schedulers? Do I have to worry about multiple layers of IO schedulers and how they interact, or does the /dev/md0 effectively override underlying schedulers?

In my attempts to answer this question, I've dug up some interesting data on schedulers and tools which might help figure this out:

andrew311
  • 391
  • 3
  • 6

1 Answers1

8

If you do a read from md0 then the readahead for md0 is used. If you did the read from sda which is a component of md0 then it would use the sda setting. Device mapper just splits an I/O up into multiple reads and writes to do the RAID, but that's all below the block cache layer where readahead takes place. The storage stack looks like:

filesystem - bypasses cache when you open with O_DIRECT

block cache - readahead, write cache, scheduler

device-mapper - dm, lvm, software RAID, snapshot, etc.

sd - disk driver

SCSI - error handling, device routing

hardware driver - scsi card, FC card, ethernet

Note that when you do

dd if=/dev/sda of=foo

you are reading sda as a file, so you are going through block cache. To go direct to the disk, do

dd if=/dev/sda of=foo iflag=direct

As for I/O elevator schedulers, those only exist on the disk driver (sd). There is no queue directory under /sys/block/md or /sys/block/dm. You only go through the disk elevator sort once.

stark
  • 360
  • 1
  • 13