I've been trying to find a straight answer on this one, and it has proved elusive. This question and its answer is close, but does not really give me the specifics I would like. Let's start with what I think I know.
If you have a standard block device and you run sudo blockdev --report
you will get something like this:
RO RA SSZ BSZ StartSec Size Device
rw 256 512 4096 0 500107862016 /dev/sda
rw 256 512 4096 2048 399999238144 /dev/sda1
rw 256 512 1024 781252606 1024 /dev/sda2
Now, you decide to change that default 256 to 128 using --setra
on any of the partitions and it happens to the whole block device, like so:
sudo blockdev --setra 128 /dev/sda1
sudo blockdev --report
RO RA SSZ BSZ StartSec Size Device
rw 128 512 4096 0 500107862016 /dev/sda
rw 128 512 4096 2048 399999238144 /dev/sda1
rw 128 512 1024 781252606 1024 /dev/sda2
This makes perfect sense to me - the block level device is where the setting is, not the partition, so it all changes. Also the default relationship between the RA setting and the device makes sense to me, it is generally:
RA * sector size (default = 512 bytes)
Hence, the changes I made above, with the default sector size will drop readahead from 128k to 64k. All well and good so far.
However, what happens when we add in a software RAID, or LVM and device-mapper? Imagine your report looks like this instead:
RO RA SSZ BSZ StartSec Size Device
rw 256 512 4096 0 10737418240 /dev/xvda1
rw 256 512 4096 0 901875499008 /dev/xvdb
rw 256 512 4096 0 108447924224 /dev/xvdj
rw 256 512 4096 0 108447924224 /dev/xvdi
rw 256 512 4096 0 108447924224 /dev/xvdh
rw 256 512 4096 0 108447924224 /dev/xvdg
rw 4096 512 4096 0 433787502592 /dev/md0
rw 4096 512 512 0 429496729600 /dev/dm-0
In this case we have a device-mapped dm-0 LVM device on top of the md0 created by mdadm, which is in fact a RAID0 stripe across the four devices xvdg-j.
Both the md0 and dm-0 have settings of 4096 for RA, far higher than the block devices. So, some questions here:
- How does the RA setting get passed down the virtual block device chain?
- Does dm-0 trump all because that is the top level block device you are actually accessing?
- Would
lvchange -r
have an impact on the dm-0 device and not show up here?
If it is as simple as, the RA setting from the virtual block device you are using gets passed on, does that mean that a read from dm-0 (or md0) will translate into 4 x 4096 RA reads? (one on each block device). If so, that would mean that these settings explode the size of the readahead in the scenario above.
Then in terms of figuring out what the readahead setting is actually doing:
What do you use, equivalent to the sector size above to determine the actual readahead value for a virtual device:
- The stripe size of the RAID (for md0)?
- Some other sector size equivalent?
- Is it configurable, and how?
- Does the FS play a part (I am primarily interested in ext4 and XFS)?
- Or, if it is just passed on, is it simply the RA setting from the top level device multiplied by the sector size of the real block devices?
Finally, would there be any preferred relationship between stripe size and the RA setting (for example)? Here I am thinking that if the stripe is the smallest element that is going to be pulled off the RAID device, you would ideally not want there to have to be 2 disk accesses to service that minimum unit of data and would want to make the RA large enough to fulfill the request with a single access.