Does KVM/QEMU support mdadm RAID5 properly?

Question

I'm not able to get anything close to usable performance using a guest file located on a mdadm RAID5 array. I've believed I optimized all the parameters of the array and filesystem for best R5 performance:

set bitmap=none
set stripe-cache=32768 (tried from 256-32768)
EXT4 stride=128 / stripe-width=384 (chunk 512K, FS block 4K, 3 data-disk)

The array performance very good on the host (105MBs using no cache, 470MBs with cache). It's made of 4 x HDDs, relative slow.

It doesn't make any different if the image file is raw or qcow2
Tried both Virtio SCSI and Virtio SATA
Tried all cache combination, also in the guest itself (Windows 10 and Linux)

Does KVM/QEMU just not work very well with RAID5 mdadm arrays?

It's seem to be a latency problem (similar to what I've seen on ESXi with local drives).

Latency almost 17 seconds and on average write performance of 1-10MBs

An example from the virt XML:

<disk type='file' device='disk'>
   <driver name='qemu' type='raw' cache='writeback'/>
   <source file='/mnt/R5_DATA/data2.raw'/>
   <target dev='sdd' bus='scsi'/>
   <address type='drive' controller='0' bus='0' target='0' unit='3'/>
</disk>

<controller type='scsi' index='0' model='virtio-scsi'>
  <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
</controller>

Host: Debian 9 stretch (kernel 4.9.0-8-amd64)

How exactly do your optimizations look like? Tuning for throughput often does not help if your workload is IO bound. — Thomas, Mar 10 '19 at 18:20
Updated my question with XML and specific ARRAY- and FS parameters. Benchmark on guest: DiskMark, just sequential write — MrCalvin, Mar 10 '19 at 18:30
The things I find strange about this configuration are: - Use of RAID 5 to begin with. Probably not the immediate cause of the problem but it's still a problem in itself. - Using an image file instead of a block device. This adds a possibly needless layer of abstraction. - Use of ext4 instead of XFS or a more modern filesystem. ext4 is aging at this point and many distros have switched to XFS or other filesystems as their defaults. - Use of Debian as a hypervisor. Other distros like RHEL/CentOS have better tuned kernels and more options (e.g. tuned daemon) for further performance tuning. — Michael Hampton, Mar 10 '19 at 18:50
Yea, it's difficult to find any recommendation for R5 on the net :-P For saving datafiles (e.g. mediafile or just as fileshare) I find R5 very attractive. It's the most effective in terms of space-usage. Of course a disk can die, but you have plenty of time to do a incremental backup and then replace the disk and rebuild.No problem. I choose EXT4 to keep it a simple as a starting point. When that works I can always switch to a more modern FS I was thinking about trying another kernel, my first choice was Arch. But mdadm is so mature/"old" that i'm surprised if it doesn't work with Debian9 — MrCalvin, Mar 10 '19 at 19:08

score 1 · Answer 1 · answered Aug 19 '20 at 19:49

You are missing a few tricks. You need to ensure the entire storage stack is aligned to achieve optimal performance. You need to start with the top level I/O size and optimise from there. For example, create your guest's NTFS file system with 64KB clusters, use LVM block devices (watch out for LVM alignment anomalies), and optimise your software RAID for 64KB chunks. If you must use files for backing the VM block devices, make sure the file system is aligned the same way, particularly paying attention to your block group size (-g option on mkfs.ext*) to ensure that your block groups don't all start on the same disk. If your top level application workload uses smaller blocks, align for smaller blocks, but the principle is the same.

I wrote an article on how to ensure optimal file system alignment for a particular workload, which you may find helpful.

Thanks for you comment, I'll investigate :-) It's a never ending task to optimize the storage, especially on virtualization systems. But storage is mostly also biggest bottleneck, so it really need some attention. — MrCalvin, Aug 20 '20 at 05:57
You're welcome. :-) Indeed, storage is a very overlooked and opaque bottleneck in virtualized systems. CPU overheads can be quite dramatic, too. — Gordan Bobić, Aug 20 '20 at 07:10

score 1 · Accepted Answer · answered Apr 14 '19 at 13:40

After doing a lot of testing it sure is possible to get very good performance. There are several circumstances that make a huge different:

Version of md (i.e. kernel), KVM/QEMU and virsh version
cache configuration for you guest (the cache setting in your virsh XML file)
MD configuration, especially the stripe_cache and chunk size (should be set low to 64K)
Optimize EXT4 filesystem, e.g. use noatime and depending on UPS backup etc. disable journaling and barriers.

I was running Debian stable (9, Stretch) and that mean you run a very old version of kernel, QEMU and virsh :-(

I now moved to Ubuntu stable for the same reason which will give you a huge jump to newer versions of those important components. In my view Debian stable it really a bad choice. When 10/Buster is released is guess it will be okay for a periode, but sooner or later you'll have an outdated system.

Secondly, it was very obvious that you need to use the page-cache of the host system. There are some advance caching going on when using MD RAID5. Even when the cache is full, the performance very good. In my case over 200MB/s write on slow disks which perform about 100MB/s in JBOD configuration.

Using cache=none in the XML and then use the page cache of Windows guest will give you very slow performance. SSD might be another story.

Remember also to play with the page-cache settings in Windows guest for the current disk to get the best result.

Does KVM/QEMU support mdadm RAID5 properly?

2 Answers2