Why do VirtualBox guest kernels run in ring 1 instead of ring 3?

2

2

When VirtualBox runs on an x86 platform, according to the documentation:

When hardware virtualization (i.e. VT-x or AMD-V) is enabled , the hypervisor (i.e. VirtualBox itself) runs in VMX root mode (aka ring -1), and virtual machines run in VMX non-root mode (aka ring 0). This is also how other hypervisors work.

On the other hand, when hardware virtualization is unavailable, software virtualization is used instead and guest kernels run in ring 1. From section 10.6 of the link above:

  • Guest ring 3 code is run unmodified, at full speed, as much as possible...

  • For guest code in ring 0, Oracle VM VirtualBox employs a clever trick. It actually reconfigures the guest so that its ring-0 code is run in ring 1 instead, which is normally not used in x86 operating systems). As a result, when guest ring-0 code, actually running n ring 1, such as a guest device driver attempts to write to an I/O register or execute a privileged instruction, the Oracle VM VirtualBox hypervisor in the "real" ring 0 can take over.

...

  • Running ring 0 code in ring 1 causes a lot of additional instruction faults, as ring 1 is not allowed to execute any privileged instructions, of which guest's ring-0 contains plenty. With each of these faults, the VMM must step in and emulate the code to achieve the desired behavior. While this works, emulating thousands of these faults is very expensive and severely hurts the performance of the virtualized guest.

This is interesting as it is the only application of ring 1 that I have come across.

Per the above quoted sections, even though the guest kernels run in ring 1, when a guest device driver attempts to write to an I/O register or execute a privileged instruction, the VirtualBox hypervisor (ring 0) needs to take over. So it appears as though the performance penalties incurred due to software virtualization would be the same whether the guest kernels are running in ring 1 vs ring 3.

I did come across this SO post that says:

Rings 1 and 2 are in a way, "mostly" privileged. They can access supervisor pages, but if they attempt to use a privileged instruction, they still GPF like ring 3 would. So it is not a bad place for drivers as Intel planned...

Questions

  1. How does running guest kernels in ring 1 instead of ring 3 improve performance.

  2. What are the security implications of running guest kernels in ring 1 (and therefore giving guest kernels "access to supervisor pages")?

catanman

Posted 2019-02-06T06:35:45.053

Reputation: 141

@ramhound Done. – catanman – 2019-02-06T15:18:46.193

Answers

1

I got some very helpful answers from the folks at #vbox-dev on freenode as well as other online resources.

  1. It doesn't improve performance. As mentioned in the VirtualBox documentation, guest user space runs in ring 3 and guest kernel space runs in ring 1. This allows the guest kernel space to be protected from the guest user space through pagination (see slide 19). The following explains how pagination is used to achieve this protection.

    https://manybutfinite.com/post/cpu-rings-privilege-and-protection/

    Each memory page is a block of bytes described by a page table entry containing two fields related to protection: a supervisor flag and a read/write flag. The supervisor flag is the primary x86 memory protection mechanism used by kernels. When it is on, the page cannot be accessed from ring 3. While the read/write flag isn't as important for enforcing privilege, it's still useful.

  2. The good news is that guests cannot execute privileged instructions since only ring 0 can do so. The bad news is that on a 64-bit system, ring 1 potentially has access to the host's memory pages. This is because in 64-bit mode , segment limits no longer apply since segmentation has been mostly replaced with paging. Unfortunately paging does not distinguish between privilege levels 0-2 when it comes to memory isolation. This issue is known as ring compression (see slide 19).

    https://cseweb.ucsd.edu/~jfisherogden/hardwareVirt.pdf

    Ring Compression

    To provide isolation among virtual machines, the VMM runs in ring 0 and the virtual machines run either in ring 1 (the 0/1/3 model) or ring 3 (the 0/3/3 model). While the 0/1/3 model is simpler, it can not be used when running in 64 bit mode on a CPU that supports the 64 bit extensions to the x86 architecture (AMD64 and EM64T).

    To protect the VMM from guest OSes, either paging or segment limits can be used. However, segment limits are not supported in 64 bit mode and paging on the x86 does not distinguish between rings 0, 1, and 2. This results in ring compression, where a guest OS must run in ring 3, unprotected from user applications.

    The above paragraph suggests that on 64-bit systems, due to segmentation being dropped, both the guest kernel and guest userspace must run in ring 3 (0/3/3 model) in order to protect the host from the guest. However see slide 37 suggests that it could be possible to maintain the 0/1/3 model and prevent ring 1 from accessing the host through very complex Binary Translation (BT). Perhaps this is the strategy that VirtualBox implements?

It's important to remember that this whole discussion only pertains to full software virtualization and is therefore very much outdated since very few CPUs don't support hardware virtualization. As someone from #vbox-dev pointed out.

software virtualization is a dying species, though. so few CPUs left without hardware virtualization support. At some point we'll have to make a tough decision - keeping code alive costs time and money.

catanman

Posted 2019-02-06T06:35:45.053

Reputation: 141