How is the Heartbleed exploit even possible?

Question

I have read about the Heartbleed OpenSSL vulnerability and understand the concept. However what I don't understand is the part where we pass 64k as the length and the server returns 64kb of random data because it does not check whether we really passed 64kb of echo message or 1 byte.

But how is it even possible for a process on a server to return 64kb of random data from the RAM?

Isn't the operating system supposed to prevent access to the real RAM and only allow access to virtual memory where one process cannot access the memory contents of other processes?

Does OpenSSL run in kernel mode and thus has access to all the RAM?

I would expect a segmentation fault if a process tried to access any memory that it didn't explicitly allocate. I can understand getting 64kb of random data from the process which is running the OpenSSL program itself but I don't see how it can even see the complete RAM of the server to be able to send it back to the client.

UPDATE: @paj28's comment, yes it was precisely the false information that led me to wonder about this. As you said, even the official heartbleed.com advisory phrases it in a misleading way (although I would say they did so because it's intended for a much wider audience than just us technical folks and they wanted to keep it simple)

For reference, here is how heartbleed.com states it(emphasis mine):

The Heartbleed bug allows anyone on the Internet to read the memory of the systems protected by the vulnerable versions of the OpenSSL software.

For any technical person that would imply the complete RAM of the virtual/physical machine.

OpenSSL is a shared library, so it runs in the same memory space as the process using it (e.g. Apache). The OS stops it reading memory from other processes, but it can read memory from the same process, which will sometimes contain sensitive data. — paj28, Oct 10 '16 at 20:06
@paj28 If that's the case, the question is answered pretty briefly. I didn't have much time for reading news and informing me about stuff when heartbleed went public so I wasn't able to scratch more than the very surface and unfortunately haven't done so since. However, I do know that mass media portrayed the heartbleed bug to enable an attacker to copy an image of the entire memory of a server over to them provided the attacker gets enough time. This probably where this question originates from. (But if that's false information, the question might be based on false information.) — UTF-8, Oct 10 '16 at 20:11
It's all memory from the same process which is running openssl - what makes you think it's memory from a different process? — user93353, Oct 10 '16 at 21:59
@UTF-8: If by "a server" you mean "a physical or virtual hardware machine", then yeah, Heartbleed couldn't copy the full memory from it. However, "server" is also used to refer to server *processes*, like Apache (`httpd`), Tomcat, IIS, sendmail, openvpn's server, etc. In practice, it'd be very hard to dump the *full* memory of a server (process) even if you exclude the executable pages, but you can probably get everything you care about if you work at it for long enough. — CBHacking, Oct 10 '16 at 22:19
@UTF-8 - I was about to say this is the media dumbing things down. But in fairness the [original advisory](http://heartbleed.com/) is similarly vague. I think this is their attempt to "sex up" the advisory. — paj28, Oct 11 '16 at 07:14
Look up "buffer overflow attack", it covers the basic idea. Heartbleed is same thing but applied in reverse. — Agent_L, Oct 11 '16 at 10:11
@paj28, meh, it's clear enough from the context, just not spelled out to the end for every layman who doesn't know what the words "client" and "server" mean in a software context. I don't think I've heard any physical devices being called just "clients" without a specifier like "thin client", so in connection with that, "server" also means something else than a full device. Also note that the next mention of "server" is in the sentence _"The most notable software using OpenSSL are the open source web servers like Apache and nginx."_, which clearly mentions software. — ilkkachu, Oct 11 '16 at 15:47
It would, of course, be possible to implement a TLS library in a separate process, with separate virtual memory. Somewhat like the services in a microkernel. Though of course microkernel OS's are somewhat rare in general purpose computing due to their performance and separating the TLS library to a different process would add another set of context switches to the communication path. — ilkkachu, Oct 11 '16 at 15:51
Moving the asymmetric key operations to a separate process would be easy enough and the overhead would be far less. — Ben, Oct 12 '16 at 08:37
One more thing that is easy to miss: the 64k don't actually come from a truly random location. They come from the 64k-1 following the buffer that holds the 1 byte. That one byte must be in valid memory. We also know that the OS doesn't allocate memory by the byte, but in pages. The custom allocator explained by @CBHacking may not even be necessary, because even with OS memory allocation, at least the first several KB would never cause a segfault. — Kevin Keane, Oct 12 '16 at 09:44
@ilkkachu: But moving OpenSSL to another process would not have mitigated Heartbleed, because the disclosure is of OpenSSL's own data (keys) and data it is acting on (plaintext) which would both be present in the OpenSSL process even in a multi-process model. To get any protections, you'd have to spawn a new process per transaction, and even that might not protect against Heartbleed since even processing of echo messages will involve the keys. Probably what would help would be handing only the temporal symmetric key to the isolated process, never the master. — Ben Voigt, Oct 12 '16 at 21:28
@BenVoigt, yah, I didn't mean to say it would have prevented the problem in this case. Just that such a design would be possible in principle, and might even help protect any secrets held by the main program. — ilkkachu, Oct 12 '16 at 21:56
@KevinKeane: Good point, although you don't know *where* in a page the 1-byte allocation was made. Allocators can combine multiple small allocations into a single page (rather than wasting the majority of each page, which is potentially a lot of wasted memory if you use a ton of small allocations and also make swap needlessly expensive). It's possible that the 1-byte allocation would be near the end of a page, and an over-read would segfault anyhow. In fact, some memory debuggers do that (allocate at the end of pages so overflows crash). Also, 64k is 16 pages; still a lot of contiguous RAM. — CBHacking, Oct 12 '16 at 23:28

score 210 · Accepted Answer · answered Oct 10 '16 at 20:44

210

@paj28's comment covers the main point. OpenSSL is a shared library, so it executes in the same user-mode address space as the process using it. It can't see other process' memory at all; anything that suggested otherwise was wrong.

However, the memory being used by OpenSSL - the stuff probably near the buffer that Heartbleed over-reads from - is full of sensitive data. Specifically, it's likely to contain both the ciphertext and the plaintext of any recent or forthcoming transmissions. If you attack a server, this means you'll see messages sent to the server by others, and server responses to those messages. That's a good way to steal session tokens and private information, and you'll probably catch somebody's login credentials too. Other data stored by OpenSSL includes symmetric encryption keys (used for bulk data encryption and integrity via TLS) and private keys (used to prove identity of the server). An attacker who steals those can eavesdrop on (and even modify) the compromised TLS communication in realtime, or successfully impersonate the server, respectively (assuming a man-in-the-middle position on the network).

Now, there is one weird thing about Heartbleed that makes it worse than you might expect. Normally, there'd be a pretty good chance that if you try and read 64k of data starting from an arbitrary heap address within a process, you'd run into an unallocated memory address (virtual memory not backed by anything and therefore unusable) pretty quickly. These holes in a process address space are pretty common, because when a process frees memory that it no longer needs, the OS reclaims that memory so other processes can use it. Unless your program is leaking memory like a sieve, there usually isn't that much data in memory other than what is currently being used. Attempting to read unallocated memory (for example, attempting to access memory that has been freed) causes a read access violation (on Windows) / segmentation fault (on *nix), which will make a program crash (and it crashes before it can do anything like send data back). That's still exploitable (as a denial-of-service attack), but it's not nearly as bad as letting the attacker get all that data.

With Heartbleed, the process was almost never crashing. It turns out that OpenSSL, apparently deciding that the platform memory management libraries were too slow (or something; I'm not going to try to justify this decision), pre-allocates a large amount of memory and then uses its own memory management functions within that. This means a few things:

When OpenSSL "frees" memory, it doesn't actually get freed as far as the OS is concerned, so that memory remains usable by the process. OpenSSL's internal memory manager might think the memory is not allocated, but as far as the OS is concerned, the OpenSSL-using process still owns that memory.
When OpenSSL "frees" memory, unless it explicitly wipes the data out before calling its free function, that memory retains whatever values it had before being "freed". This means a lot of data that isn't actually still in use can be read.
The memory heap used by OpenSSL is contiguous; there's no gaps within it as far as the OS is concerned. It's therefore very unlikely that the buffer over-read will run into a non-allocated page, so it's not likely to crash.
OpenSSL's memory use has very high locality - that is, it's concentrated within a relatively small range of addresses (the pre-allocated block) - rather than being spread across the address space at the whim of the OS memory allocator. As such, reading 64KB of memory (which isn't very much, even next to a 32-bit process' typical 2GB range, much less the enormous range of a 64-bit process) is likely to get a lot of data that is currently (or was recently) in use, even though that data resides in the result of a bunch of supposedly-separate allocations.

answered Oct 10 '16 at 20:44

CBHacking

40,303
3
74
98

58

Note that implementing your own meta memory manager for performance reasons is far from unheard of; it saves a lot of overhead. While actually freed memory is more likely to cause a segmentation fault, the real problem was that it was possible to have a buffer overread in the first place. – user2752467 Oct 10 '16 at 23:58
8

Another effect of the custom allocator is that "torture testing" using valgrind was not effective. – paj28 Oct 11 '16 at 13:44
8

ISTR that the problem with OpenSSL's own allocator was mostly that it made tools for checking memory accesses useless, since from the system library viewpoint (which is the place where such tools would work) everything _was_ continuously in use, reserved by the OpenSSL allocator. In most cases freed memory isn't immediately returned to the OS either, the default for glibc on Linux seems to be to make a separate `mmap()` only if the allocated block is 128 kB or larger. So for anything smaller than that, even a 64 kB buffer, you're not likely to get a memory hole. – ilkkachu Oct 11 '16 at 15:26
14

Perhaps more importantly the custom allocator means you are likely to get data from openssl rather than random crap from the application. I imagine this greatly increases the probability of getting the all-important private key (which lets you MITM the server and lets you decrypt all sessions using non-dh ciphersuites). – Peter Green Oct 11 '16 at 20:47
2

Most of the things you say about the OpenSSL memory allocator would be equally true about about the allocator provided by libc. – kasperd Oct 11 '16 at 21:02
1

I suspect that the manual memory allocation is in order for it to be able to zero all memory before freeing it, prevent timing or memory-usage attacks from obtaining data, and other security concerns: that is, rather than being bad design, it's overall a benefit for security, but backfired in this instance. – Dewi Morgan Oct 11 '16 at 21:30
Memory returned to the OS is not necessarily salvaged right away, either; I don't think Linux does this often (at all?). – jpaugh Oct 11 '16 at 21:37
3

@jpaugh: Calling `munmap` to return memory to the OS does modify the page tables during that system call. Reading the now-unmapped virtual address wil cause a segfault! This is what returning memory to the OS is all about. It doesn't matter what the kernel does with the physical page after it's unmapped. It won't ever expose the data to user-space (other than through `/dev/mem`, or a kernel-memory-read exploit...). The page goes in the kernel's pool of free pages, and might be zeroed ahead of time if the kernel is low on pre-zeroed pages (needed to satisfy other user-space allocations.) – Peter Cordes Oct 11 '16 at 22:11
2

High locality is a performance advantage: reusing the same still-mapped few pages of memory means it might still be hot in L2 cache. This is very good if you're doing a microbenchmark that makes back-to-back calls to an OpenSSL function that allocates/uses/frees a buffer. It's still good, but less beneficial, if your program does lots of real work in between reusing a buffer (it's likely not still hot in cache, but at least you don't have to mmap/munmap it. Although as @ilkkachu points out, good malloc implementations do that anyway for small enough buffers). – Peter Cordes Oct 11 '16 at 22:16
7

Oh yeah, not claiming that there are *no* advantages to OpenSSL's custom `malloc`, just that from a *security* perspective - rather important for such a security-focused piece of code - it's dangerous, and it made the (already-terrible) vuln of Heartbleed significantly worse. This seemed relevant to the question; even though the asker seemed somewhat confused, it's true that Heartbleed was surprisingly effective; a lot of software would have been harder to get such juicy results out of. – CBHacking Oct 11 '16 at 22:28
4

@kasperd: of course, but usually for the standard library allocator there are better debugging/sanitizing tools - valgrind does understand regular `malloc` and `free`, but not those from custom allocators; and almost every C standard library does have some debug mode for the allocator which helps to spot common bugs. – Matteo Italia Oct 12 '16 at 09:26
1

The converse of this, of course, is that servers that used a process-per-connection model, even though they still used the vulnerable library, did not expose other sessions' SSL data; because those data would be in the memory spaces and OpenSSL heaps of other processes. The only SSL data that they could expose were their own session's. – JdeBP Oct 13 '16 at 07:50
@PeterGreen Yes, I should have made that explicitly clear with the locality point... it's not just that you'll get a lot of currently- or recently-used data, it's that it's data that is/was used *by OpenSSL*, and therefore includes things like plaintext messages, symmetric keys, and that all-important private key. – CBHacking Oct 14 '16 at 04:36

score 6 · Answer 2 · answered Oct 13 '16 at 09:05

6

I would expect a segmentation fault if a process tried to access any memory that it didn't explicitly allocate

This is where the misconception lies.

Any broken memory access could result in a segmentation fault, but actually if the requested memory address lies within the current process's address space (say, a variable you just freed), this is highly unlikely.

That's why you should not rely on segmentation faults for finding memory access bugs!

answered Oct 13 '16 at 09:05

Lightness Races in Orbit

2,173
2
14
15

1

Modern languages, like java, range check all array accesses, so if the source language for openssl were java, this exploit would not exist. Almost all memory exploits are associated with old school languages such as C and C++ which depend on the programmer to check their own array acesses. – ddyer Oct 13 '16 at 17:44
1

@ddyer: Thank goodness for those "modern" languages with bounds-checking, like Algol 60, Ada 83 and even Lisp. :-) – Oddthinking Oct 14 '16 at 03:59
1

@ddyer, you do know that the [JVM itself is written in C++](http://stackoverflow.com/a/10028233/5419599), right? I guess we should stop teaching new programmers old languages, and just use existing "old school" tools forever, and hope they never break. Right? – Wildcard Oct 14 '16 at 04:43
1

@ddyer: Whether a language has bounds checking has almost nothing to do with its age. It has instead everything to do with its intended use. Sure, you could write OpenSSL in some other language, but for something so critical, being closer to the machine is good. Language-mandated bounds checking for every single array access in an SSL implementation = sloooooooow. – Lightness Races in Orbit Oct 14 '16 at 08:40

How is the Heartbleed exploit even possible?

2 Answers2