How can I determine the performance penalty of VT-d vs. baremetal?

Question

I am looking to configure my server in a slight odd way to create a private cloud setup.

I have to take the HBA that my storage disks are tied to and pass it into a VM (probably xen) to allow the creation of a vSAN type setup. I want to do this so I can implement a SAN type setup within a single server.

When I was asking around people were telling me the IOPS would not be as good as baremetal. I kinda of factored that into my setup, but it got me wondering just how much would it be hurting the system?

Also someone mentioned that the VM running on the baremetal but having a storage controller passed in creates an odd dependency loop since the VM still relies on the host for the memory. Other than it being a pain to troubleshoot does it really cause any significant degradation in performance?

Oh yeah by the way the filesystem mainly in question here would be ZFS. Probably running on FreeBSD (including things like FreeNAS and NexantaStor) or OpenIndiana.

Thanks!

Your big issue is not memory, but the storage controller itself. The hypervisor cannot use it directly, and therefore the VM must have its own local filesystem located elsewhere. — Michael Hampton, Feb 04 '16 at 12:11
What do you mean by its "own" storage? Are you referring to the VM itself as in the .img or .vmdk? I was still going to have a small set of disks (2 - 4) for the main system / hypervisor / SAN image. It was either that or I was going to boot the SAN vm off of a disk hanging off the storage controller and bypass the use of an image file altogether. — ianc1215, Feb 04 '16 at 12:15
Right, the image from which the VM boots cannot be on storage connected to the storage controller, because the hypervisor won't have access to it once it is passed through to that VM. Otherwise, the hypervisor _can_ use it (provided by the VM) as a datastore for other VMs, if it is prepared to wait for it to become available while the VM boots. — Michael Hampton, Feb 04 '16 at 12:17
I was under the impression that a VM is able to boot off of a piece of hardware if supported by the hypervisor. Well this is getting off topic though. So let's say I add a way for the SAN VM to keep its image on another drive. What would be the penalty of VT-d'ing my storage controller? — ianc1215, Feb 04 '16 at 12:19

score 1 · Answer 1 · edited Apr 13 '17 at 12:14

1

I don't recommend this type of passthrough setup anymore. It's a lose-lose on most counts, especially with reliability and performance.

It can certainly be done, but the safest solution (especially with a hypervisor like ESXi) is to just use a supported RAID controller and local storage. If you want ZFS storage, build a standalone ZFS storage system.

edited Apr 13 '17 at 12:14

Community

1

answered Feb 04 '16 at 13:06

ewwhite

194,921
91
434
799

Building a standalone box is not an option. This server is going to reside in a 2U colo. So you are saying I would better off running ZFS on the hardware directly? For a light "production" server would it be okay to pair it with the hypervisor? Production is in quotes since it would be my own use a maybe a few strangers but nothing mission critical. I was not going to run ESXi, I was going to use Xen most likely. Does that make any difference to my question above? – ianc1215 Feb 04 '16 at 13:10
I'm not sure Xen is the best hypervisor solution these days, but can you describe why you're insisting on ZFS? – ewwhite Feb 04 '16 at 13:14
Well if not Xen then I was going use KVM / LXC. KVM for things that containers won't work for like Windows. I am looking to use ZFS becasue I want the benefits of things like snapshots and compression. Originally I also wanted dedup but then I realized that not needed for what I am doing and would be a waste of memory. Originally I was going to run BTRFS but the RAID 6 support seems to be sketchy according to what I read plus performance with KVM was awful I read. – ianc1215 Feb 04 '16 at 13:25
Oh, I see. ZFS is great, but this type of setup isn't the best use of it outside of experimentation. – ewwhite Feb 04 '16 at 15:12

score 1 · Accepted Answer · answered Feb 05 '16 at 13:31

1

A real answer, though.

Yes, there's a potential for latency or a performance hit from running your storage in a VT-d passthrough.

But think about the practical aspects. Your system isn't going to be IOPS-bound in the first place. There are several levels of abstraction in this storage, and the fact that you're using VMs at all indicates that you're okay with the tradeoffs compared to bare metal.

The real concern you should have is whether the solution will work at all! VT-d can be temperamental and doesn't work with every adapter.

So test for yourself with your workloads!

answered Feb 05 '16 at 13:31

ewwhite

194,921
91
434
799

Yeah I think that is going to be best the course of action I just need to test it against my workloads. In your other response you did not seem to crazy about Xen. In terms of performance per hypervisor would I see any difference in performance with KVM over Xen or it is really going to close? – ianc1215 Feb 05 '16 at 14:33
I use VMware. But if not using VMware, KVM or Hyper-V are alternatives, depending on what you're doing. Typically, Xen is not the first choice unless you specifically need to use it. – ewwhite Feb 05 '16 at 14:37
I see, well maybe I need to do a little planning before I jump into this. Thanks for the input! – ianc1215 Feb 05 '16 at 14:38

score 1 · Answer 3 · answered Mar 15 '16 at 15:00

Disclaimer: I have no experience with Xen, so I write with regards to ESXi, but the concepts are similar.

As for your initial question "How to test the performance difference between baremetal and VM with passthrough?":

Your setup would be as follows:

Server-grade mainboard with Intel Xeon CPU supporting VT-d (there are alternatives, but I would keep it simple)
A single SATA SSD of about 20 to 30 GB, depending on your OS (can also be an HDD, but it will be a bit slower)
Your HBA with an LSI chip that supports your disks
Two Intel NICs, could be onboard or on PCIe or both, but they should be the same model. Most boards already have them, but only some have 10Gbit ones. If you use 1Gbit, you will usually max out your performance for sequential read and write with about 4 disks or a single SSD, so I would recommend 10Gbit (also internal network is in most cases 10GBit, so this would be more interesting).
A USB stick for the hypervisor

First, install your hypervisor onto the USB stick and configure it. Partition your SSD into 2 slices. Reboot and install your baremetal OS like you would do normally onto the first slice. Use your HBA with your other disks and configure one or more pools for your performance tests. Do those tests and write down the results. You should at least test local performance from the command line and network performance with your desired protocols (iSCSI, NFS, SMB). If possible, also test the SSD (not strictly necessary). If finished, export your pool(s).

Then, reboot (I assume you have Remote Console, so you can do this remotely), boot from your USB stick instead of the local SSD. Now use the second slice of the SSD to create a virtual file system that spans the entire slice 2. Onto this VFS, install your system from the same ISO as before, but now as a virtual machine. Setup passthrough in the hypervisor and assign one physical NIC and your HBA to this new VM. Also assign at least one virtual NIC to this VM (or more, if you want to test different types). Assign as most RAM as possible to the VM to make the conditions similar.

Then boot the VM, configure and from the VM (via VNC or SSH) import your pool(s) again. You can now do the same tests as before (local and remote) for both the physical and the virtual adapters and note any differences. Additionally, you can create a second VM, but now located on a shared NFS or iSCSI volume served from your pool. Tests on this VM tell you much about your later use case as a VM host.

Some thoughts beyond performance metrics. I kind of like this setup for several reasons:

It is very similar to a native setup: If your server dies, you can remove all disks and plug them into any other host - if the host has a hypervisor, you can just continue to use them after you boot a small storage VM; and even if not, you already have all your data and network shares instantly ready, because everything is on the pool itself. With a traditional setup you would have to worry about the same type of RAID controller, filesystem support and would need a hypervisor (or copy the data off the virtual disks).
It is surprisingly stable if the components play well together: if you buy good hardware, you will have a lot less problems; and even if problems arise, your data will be save (at least if you don't disable sync'ed writes, which you should never do with VMs!). And even if you would lose data for any reason, you would know it sooner than with other file systems.
It is cheaper/more efficient: you have essentially virtualized your complete SAN, but you do not need two cases, two rackspaces, two redundant power supplies, two CPUs and so on. On the other hand, the same budget gets you much further - instead of two normal servers, you can get a beefy one with all the nice features (redundant power, HBA multipath, ), and your resources (memory, power etc) can be used as you need it.
It is flexible: You can use the best operating system for each task. For example, use Oracle Solaris an illumos distribution like OpenSolaris, OmniOS or Nexenta for your storage VM to get all the newest features and the stability they offered for years. Then add Windows Server for Active Directory, FreeBSD or OpenBSD for internal or external routing and network tasks, various linux distributions for application software or databases and so on (if you do not want the overhead, but want the flexibility, you can try SmartOS on bare metal, although then your hypervisor choice is limited to KVM).

There are, of course, downsides. ewwhite mentioned one, hardware needs to match. Additionally:

The performance will most likely never be as good as with a traditional setup. You can throw a good amount of money at the problem (more RAM, ZIL on SSD or NVMe, more disks, SSD disks only), but if performance is your first concern, VMs are not the best choice, ZFS is not the best choice, and both together is also not the best choice. This is the tradeoff you can't escape, you have safety and flexibility, but not maximum performance.
You need to do some minor preparations for boot, shutdown and powerloss. Rule of thumb: your storage VM must be the first to fully boot up and the last to fully boot down. Test and time your boots to know how long you must wait before starting the other machines. Shutdown is not as critical, a premature shutdown roughly equals powerloss for the individual VMs (which is safe, as long as your application software and operating system and hypervisor and storage layer agree to only use sync writes for anything important).
Updates can be a bit more time-consuming. Remember, if you change your storage VM, all other VMs have to be shut down (or will be shutdown forcefully). So plan some more downtime (this is of course the same as with a physical setup where your SAN would be down for updates). Also you should test any updates to the core functionality (hypervisor, network drivers, virtual network drivers, storage drivers, storage VM) thoroughly, because you don't want a bug that results in flaky NFS behaviour at random times.
You are still doing backups, right? ;)

How can I determine the performance penalty of VT-d vs. baremetal?

3 Answers3