I'll bite...
Firstly let's handle meltdown (Google: P3). As you probably know, your system has multiple privilege levels for code known on Intel/x86 as ring 0 (supervisor code) and ring 3 (user mode), which are the CPL bitfield choices in your segment selectors. Code is either running using one segment selector or the other (the selectors cover the whole address space) and this determines what privilege the code can execute with.
In parallel, there is virtual memory, controlled by page tables in the CR3 register. Paging allows for a virtual address space of a certain size regardless of the actual amount of backing storage (physical memory) and allows operating system developers to do things like write pages to swap (windows: pagefile) to evict them when tasks are using memory, but have not accessed it for some time and physical RAM is in demand. Page tables also have a user bit, which controls whether code can be seen by code executing at cpl 3. If the bit is set, user mode code can see the page, otherwise not.
Now there are two key points to understand. Firstly, on x86 (and many other architectures), hardware level tasking does not exist. A process, then, (linux task, windows process) is entirely a concept in software. This works by allocating each process its own virtual address space and when processes switch, the relevant pages are switched, "plugging in" the new process into the virtual address space and removing the old one (or unmapping it, even if not evicted from physical RAM) temporarily.
The second thing to understand is that this is a relatively expensive processes, so operating system designers came up with a "higher half kernel": in each process' address space, the kernel is mapped into the higher half of memory completely. This brings us an important advantage: if the kernel needs to handle an interrupt, or perform some IO, we can jump into the kernel space via a system call or interrupt handler and we don't need to switch address spaces of the process. If the task doesn't require us to switch address spaces for any other reason (this is slightly different to a context switch, which is the save of register state, which we can still avoid using the higher half design in some cases) this is nice and efficient: the kernel does its work and returns to user mode as fast as possible.
Unfortunately it turns out that speculative execution occurs across privilege levels on Intel CPUs, specifically that code running as CPL 3 can speculatively load pages with a cleared user bit (so, supervisor/ring 0-2 only). Meltdown exploits this by referencing memory from the higher half on a page-by-page basis through instructions that are only executed speculatively. However, the processor pulls those pages into the higher caches, resulting in a cache based timing side channel.
KPTI and equivalent patches in macOS and Windows protect against this by not using a higher half design. Only the bare minimum needed to enter the kernel address space is mapped into every process; the rest of the kernel lives in its own virtual address space like a separate process. This has two main effects:
- Any entry to the kernel implies a switch of address spaces. This is more expensive than the higher half design, hence the quotes of a 5%-30% slowdown. The reason for the "it depends on your workload" explanations is because it does depend on how many times you need to enter the kernel.
- However, now, there is only a small amount of kernel space left in the same virtual address space as the process and most of it is an uninteresting interrupt handler. You can leak it, but, it's a smaller target and its location can be randomized as well. There's simply no valid references any more to things like disk encryption keys and so on, as they exist in a separate virtual address space, which requires a switch to the kernel process first.
This is obviously bad news. The same issue issue can occur for some kinds of hypervisor designs. Containers are a kernel construct entirely and thus are affected, just like processes.
Spectre (Google: P1, P2) works similarly, except it allows the attacker to induce a target to perform some kind of speculative execution in its address space and leak that via a side channel. A critical difference from the Spectre paper is:
Spectre attacks only assume that speculatively executed instructions
can read from memory that the victim
process could access normally, e.g., without triggering a
page fault or exception. For example, if a processor prevents speculative execution of instructions in user processes from accessing kernel memory,
the attack will still work. [12]. As a result, Spectre is orthogonal to Meltdown [27]
In other words, you cannot use Spectre to read kernel memory from an unprivilged process directly - or any memory in a higher privilege level.
In order to fix these issues there are a number of possible resolutions:
- XEN XSA-254's FAQ suggests that Intel, AMD etc are preparing microcode updates that will flush branch prediction logic on entry to the hypervisor, to ensure any attempt at poisoning across this boundary is removed.
- It also mentions (and this applies elsewhere) applying indirect jumps in code, in such a way that speculative execution does not occur. This defence is to effectively defeat the CPU's optimizer and requires that the software in question is recompiled. It is this defence that is being added to GCC, Clang as mentioned in other answers.
Now, to answer the actual questions you asked:
For example, if I am on a Windows OS, running Firefox and Chrome, where the OS is patched, but not any of the browsers, what does that buy me? Is speculative execution to read kernel memory via javascript still possible in that case?
If you have the KPTI patch series or equivalent, the answer is no, you cannot. Or more precisely, you can, but you can only find uninteresting things like kernel code for switching address spaces.
As a bonus, I've read that Mozilla and Microsoft have temporarily mitigated the attack via browser, by purposely making some the javascript timing mechanisms inaccurate/less-precise. Isn't it theoretically possible to bundle your own time measuring libraries to bypass that?
Theoretically, if a method of measurement exists that is accurate to the required degree in browsers already, yes. However, bundling your own measurement essentially requires extending the browser. It isn't as simple as loading some javascript as, underneath, javascript in the browser is using entirely browser-provided APIs. So theoretically, if there is some code path that allows measurement that has not been closed off, yes, otherwise, no.
I would go with no for all practical purposes, as a result.
Summary:
- KPTI series patches explicitly protect against Meltdown, i.e. full kernel address space reads, on Intel CPUs.
- Spectre is harder to protect against as this allows for the same speculative execution, but targeting a different victim process across address spaces. KPTI does not fix it. Individual software packages must deploy mitigations. By contrast, Spectre cannot cross the ring 3-ring 0 privilege boundary (but I believe it can across ring 0 - ring 0 (or "-1" the hypervisor) and hence the Xen comments.