I'm trying to wrap my head around "meltdown", but to first understand it, I've been trying to understand memory accesses.
From what I understand, the CPU attempts to look up the virtual address in the translation lookaside buffer, which indicates what data is in the CPU cache. If it's there we immediately fetch it. If not, we then look up the address in the page table.
Now from my reading, every process has it's own page table. However, every process also has the kernel mapped into it's page table.
Presumably the page table also has access bits, obviously we can't allow reads directly into kernel space when the CPU is in user mode (I think this is called "ring-3").
From what I understand of page tables, these access bits are stored in the lower bits of the address. As our page entries are 4k, there's plenty of bits left over to store access bits.
From what I've read about the exploit, the issue is that the check for access is done after the data is retrieved. The reason for this is for efficiency reasons, we want to quickly get the data to the CPU and we can just catch the permission error before we do any permanent changes. But unfortunately we've affected the CPU cache by doing an indirect memory fetch which is detectable using timing attacks.
This scheme might make sense if the page lookup was cheap but the access check is expensive. But from my understanding that doesn't seem to be the case.
I've read the page table on a 64 bit machine has at least three layers, which means at least three memory lookups. Hopefully these are in the cache but if they aren't that means recursively searching the page table for it's own pages.
After we've done all this work and finally found the page table entry, when we load the physical address from the page table, we also load the access bits. Why not just check it there? It seems far more trivial to check the access bit we've already loaded than muck around with circuitry to deal with it later on.
I'm obviously missing something about how the CPU is working, but I can't work out what. We have to do the page table lookup to even work out what to fetch, and once we've gone to that trouble why not just check the access bit?