5

After reading this answer which explains that modern CPUs have a ring -1 that is running a hypervisor on the CPU and handles vmenter & stuff, I am wondering:

The main operating system, like the one installed directly on the hard disk (and not one that would be inside a program like VMWare Player), must run on top of this hypervisor too. Since ring -1 has complete control over it and is actually hidden from it. Does that mean that the main OS kernel is "virtualized" too from that perspective? So does the e.g. Windows code run inside a vmenter on this ring -1 hypervisor? Or how does it work exactly?

Ela782
  • 153
  • 4
  • 1
    In that answer, it says "This piece of code is referred to as "ring -1". There is no such actual privilege level, but since it can host multiple kernels all of which believe they have ring 0 access to the system, it makes sense." So this level doesn't actually exist but is sort of a shim for virtualization. – multithr3at3d Dec 19 '17 at 18:49
  • @korockinout13 I guess that includes the main OS. Then there's still two questions, 1) does the main OS run inside `vmenter` as well, i.e. exactly the same as a virtual machine in e.g. VMWare Player would, and 2) is there then zero difference between what the main OS and an OS inside a VM can do with the CPU and is it exactly the same instruction flow for both of them? – Ela782 Dec 20 '17 at 00:07

1 Answers1

6

Hardware virtualization

If an operating system is not a host virtualizing a guest, it is not running as "ring -1". If a hypervisor is not active, ring -1 effectively does not exist and does not matter. As such, unless you are purposefully running a virtual machine, you don't have to even think about hypervisors. The short answer is that ring -1 is not a real protection ring. It is a term made up to illustrate the relative privilege difference between a virtualized guest and its host.

Current Privilege Level

In CPU parlance, ring 0 is called CPL0 (Current Privilege Level 0) whereas ring 3 is CPL3. The purpose of this is simply to allow various instructions to check the amount of privilege before executing. A task stores its current privilege in the TSS, and instructions are able to have checks to ensure that only a certain level is allowed to continue. For example, the RDTSC instruction is defined as:

if(CR4.TSD == 0 || CPL == 0 || CR0.PE == 0) EDX:EAX = TimeStampCounter;
else Exception(GP(0));

This includes a check to see if the current privilege level is 0. If it is, then it will save the value of the timestamp counter to the EDX and EAX registers, otherwise it may result in a general protection fault due to insufficient privilege.

Privileged instructions and VM exits

Hypervisor context is not a real protection ring, and CPL < 0 will never evaluate true. When a privileged instruction is to be executed in a guest, it is automatically trapped and the guest exits in a process called vmexit, and the hypervisor is allowed to determine whether or not the instruction should be allowed, and can safely emulate it if it so wishes. After it has made its decision, it gives control back to the guest using vmenter, and the guest continues along like nothing happened until it runs into another privileged instruction that forces it to give control back to the hypervisor.

Certain non-privileged instructions can also be conditionally trapped in a hypervisor, for example CPUID, which gives information about the processor. While this is not privileged (normally, at least), it is quite useful for hypervisors to trap it so they can modify the values it returns. A guest running under KVM or Xen calling this instruction with EAX set to 0 will return the vendor ID string KVMKVMKVM or XenVMMXenVMM, respectively, in three other registers. If the instruction were not trapped, the bare metal result would be something along the lines of GenuineIntel or AuthenticAMD. It is not necessary to trap this instruction for security reasons like it is for instructions that poke hardware ports, but it is quite useful in the context of managing the guest's view of what it is running on. It is possible to disable trapping for some of these instructions. Section 2.7 of the technical documentation for Intel VT-x explains the different events that can be made to conditionally trigger a vmexit. Instructions that are sensitive are trapped unconditionally.

For securely managing a guest, rather than simply checking privilege level, the system checks if the instruction is running in virtualization context. The effect is the same (a more privileged context can overrule the decision of a less privileged one), but it is implemented using hardware virtualization technology rather than x86 protection rings in the TSS. This is why the hypervisor is said to be more privileged than ring 0, because even in ring 0, a guest can have its decision overruled by the hypervisor. Since lower rings mean more privilege, and ring 0 is the most privileged level, naturally this has lead to a context with even more privilege being nicknamed ring -1. The only purpose in that naming is to help people remember that ring 0 is not necessarily the most privileged context a task can run as.

Ring -2 and -3

There is also ring -2, for System Management Mode, or SMM (a special, highly privileged context which the CPU enters when a type of interrupt called an SMI occurs), and ring -3, for coprocessors that have a high level of control over the system (such as the Intel ME or AMD PSP). None of these are actually implemented as protection rings, and a ring -3 task isn't even running on the main CPU.

forest
  • 64,616
  • 20
  • 206
  • 257
  • Well, @forest thank you for these explanation but In my opinion the main question stills pertinent, its "Inception" like, giving the fact that "ring 0 is not necessarily the most privileged context a task can run as", this means that main OS ring 0 kernel "is "virtualized" too from that perspective?" – Soufiane Tahiri Dec 21 '17 at 17:01
  • Thank you very much for this explanation! That makes many things much more clear! So: `When a privileged instruction is to be executed in a guest, it is automatically trapped and the guest exits in a process called vmexit, and the hypervisor is allowed to determine whether or not the instruction should be allowed, and can safely emulate it if it so wishes`: That does not hold true for the main (host) OS? I.e. the hypervisor is something that has to be running on the host OS (like VMWare or KVM), and the CPU doesn't have a "hypervisor" in Ring -1 or something like that - right? – Ela782 Dec 21 '17 at 17:40
  • @Ela782 Correct. The main (host) OS does not have any hypervisor behind it, so a privileged instruction can run unimpeded. In order for a hypervisor to exist, the host has to set one up. – forest Dec 22 '17 at 00:58
  • @SoufianeTahiri Virtualization doesn't just mean that there are more privileged levels out there. It is a specific technique to allow multiple systems to share resources. The host is not virtualized unless it is running in a VM. There is no "ring -1" for the host – forest Dec 22 '17 at 01:03
  • @forest Okay cool, thanks for the clarification! Before your explanation, it sounded to me like the host OS is "virtualized" too, by a sort-of "hypervisor" in the CPU (ring -1). – Ela782 Dec 22 '17 at 10:32
  • Whether or not SMM (ring -2) is a security ring is determined by the chipset. If the memory used by SMM and the events that trigger the SMI are locked down then there is no way to interfere with SMM code. – Alex Cannon Mar 31 '18 at 14:29
  • @AlexCannon Well it's never technically a protection ring because `CPL` can never go below 0. – forest Apr 01 '18 at 01:05