What are the security implications of "Row Hammer" attack?

Question

I just found out about the Row Hammer attack. Based on the description, this sounds extremely dangerous, but I wonder what percentage of devices are actually affected by these. Are there any ways to protect oneself other than replacing RAM with one that supports ECC?

Would ASLR work? I haven't looked at this attack very thoroughly, but it sounds like it might... — KnightOfNi, Mar 09 '15 at 21:42
It might not if in the end the exploit is able to access the flat memory. — d33tah, Mar 09 '15 at 21:43
"A system could ensure that, within a given refresh period, it does not activate any given row too many times without also ensuring that neighbouring rows are refreshed?" From your own link... For us mortals this just means change RAM on critical servers/network — Aki, Mar 10 '15 at 06:03
Detecting such reads and then refreshing adjacent memory rows might work. But I'm not sure which piece of hardware wouldn't be responsible for that. — CodesInChaos, Mar 10 '15 at 14:49

bain · Answer 1 · 2015-03-10T17:46:29.120

The basic security implication is that an unprivileged user can elevate their access to root/kernel level.

Google's Project Zero tested 29 different laptops manufactured between 2010-2014 inclusive, and found that 15 were vulnerable and 14 were not. They do warn that this sample size is not enough to be considered representative, but it is still suggestive that this is unlikely to be an isolated problem.

The original Flipping Bits in Memory Without Accessing Them paper suggests that the problem is much wider:

Among 129 DRAM modules we analyzed (comprising 972 DRAM chips), we discovered disturbance errors in 110 modules (836 chips). In particular, all modules manufactured in the past two years (2012 and 2013) were vulnerable

The only real way to mitigate the attack is to not run arbitrary binaries on your system, and to not allow anyone else to access your system if they might have hostile intentions. This is obviously a problem for shared computing resources e.g. universities, hosting facilities etc. Cloud providers could potentially be vulnerable (successful memory corruption from inside VMs has been reported, but it remains to be seen how exploitable this is).

Using ECC, whilst not a guarantee of security, might lower the probability of successful exploitation. The full mitigation is to update to a system that employs some defence - either a memory controller or DRAM that actively detects and avoids the exploit. Availability of such controllers or DRAM right now is unknown, but they are known to exist. Memory manufacturers have been aware of this problem for a while, and the specification for LPDDR4 includes mitigations, so future laptops will be protected.

score 5 · Answer 2 · answered Mar 10 '15 at 17:15

ECC RAM is not necessarily immune; ECC memory reliably fix one-bit flips and detect most two-bit flips, which makes the attack harder, but not conceptually infeasible.

Non-ECC RAM is not necessarily weak; in fact, as per the definition of how RAM should behave, no single bit flip should ever happen. What we are talking here is RAM with a defect: the RAM does not work like it should. The depressing fact is that such defects are a lot more common than usually assumed, since, under normal conditions, such defects are not triggered often (or at all).

Solution is to get non-defective RAM. The underlying issue then becomes: how will we detect that RAM is defective ? The well-known MemTest86 tool includes a "row hammer" test (since at least v6, available in the free version).

For a software-only solution, one could imagine a "manual refresh" done by the kernel. Some kernel thread would regularly do the following, for all pages in physical RAM:

Lock the page (i.e. mark it non-accessible for userland code).
Flip all the bits in the page, ensuring a cache flush with the relevant opcodes.
Flip all the bits again, again with a cache flush. This restores the original data in the page.
Unlock the page.

If userland accesses the page while it is being refreshed, the trap handler simply waits for the refresh to complete, then jumps back to the process so that it may try again.

The double-flip is meant to ensure that even smart hardware that tries to follow actual modifications will flush the data and thus rewrite the page (which will refill the potential wells in the DRAM chips).

To a large extent, this process would mimic what hardware already does for memory refresh. Then it would be a matter of deciding how often this should be done; refreshing more often means more CPU / RAM bandwidth spent on the refresh, so there is a trade-off. Whether an acceptable trade-off can be achieved depends on how much RAM there is, how fast the RAM is, and how defective the RAM is.

Implementation in any given operating system is left as an exercise to the reader. I presume this may have non-trivial impact on the paging/swap heuristics (when paging or swapping, the kernel tries to evict pages that have not been accessed recently, and this accounting is done by the MMU itself; the "refresh thread" will play havoc with this information gathering).

That solution doesn't make any sense. If you don't schedule the exploit process (which you can't, because its memory is locked), then your "manual refresh" is irrelevant since the exploit process isn't even being run at that time. — bain, Mar 10 '15 at 17:56
The point of the "manual refresh" is that it runs _all the time_, not just when an attack is expected. The exploit process flips a bit by _repeatedly_ flushing a cache line until it somehow "spills over" adjacent rows. This takes a bit of time (some dozens of seconds -- the linked page indicates an average of 5 minutes on one laptop, 40 minutes with a BIOS update with stricter refresh cycles). If the "manual refresh" processes the target pages more often than that, then the attack should be thwarted. In effect, this really mimics the DRAM refresh normally done by the hardware. — Thomas Pornin, Mar 10 '15 at 18:51
That is not how the exploit works. A DDR3 DRAM rank is refreshed every 64ms - the exploit happens within this time period. The problem is that the DRAM is not storing the data stable for 64ms between each refresh as it should. It does not matter if you "manual refresh" once every second - there will still be plenty of 64ms periods when the code can repeatedly attempt to exploit the memory. Every 64ms period is another opportunity to try the exploit. It is not a _"once after 5 minutes"_ distribution, it is a _"many tries averaging 1 success every 5 minutes"_ distribution. — bain, Mar 10 '15 at 22:11
Mmm... then a manual refresh may be of any good only if it happens more often than once every 64 ms, which looks hard for a machine with gigabytes of RAM (one could refresh a given megabyte or so every 10ms while still leaving most of the RAM bandwidth available, but I doubt that all potential targets in a modern machine all fit within a single megabyte). — Thomas Pornin, Mar 11 '15 at 11:24

score 1 · Answer 3 · answered Nov 02 '16 at 19:49

I had been working on DRAM for many years, here is some of my points. As I know, there are a lot of researches trying to solve this problem and some of them claim have solution. But as what I see, most of them do not have a full protection. So, it might be a big bomb in the computer world if some hacker able to make successfully to trigger this bomb. It is not only DDR-3 and manufactured between year 2012-2013 are vulnerable. Actually it can be happen on all DDR-3 and even DDR-4. The problem is caused by higher density of today’s DRAM chips. The data stored in each cell cause the nearby cells to change its contain when doing a lot read on the same cell. To increase the Refresh is not going to solve the problem, it just reduce the chance of being hit. But it also increase the system heat and slow down the process power. In fact, when this happen, it becomes more vulnerable to Row Hammer hit. So, increase Refresh Rate is not a good solution. ECC is not going to solve the problem, it just make it more difficult to make it happen, because ECC only can correct 1 or 2 bits error. Sorry, I haven’t see any good solution yet, but I will post here if I know it.

What are the security implications of "Row Hammer" attack?

3 Answers3

Linked