7

Most Linux distributions now use ASLR for many programs, to randomize the layout of memory.

How often is the randomness used for this changed? If I re-run the same program multiple times, will it receive the same layout each time, or will it differ every time? Is fresh randomness generated each time the program is run? each time the machine is rebooted? When are the random values refreshed/reset to new values? Does this depend upon the Linux distribution? Does it depend on which region of memory we are talking about (e.g., the executable, dynamic libraries, the stack, the heap, etc.), or is the answer the same for all of them?

D.W.
  • 98,420
  • 30
  • 267
  • 572

2 Answers2

8

The Wikipedia page shows that the answer is "it depends". There are multiple implementations and patches around.

The important points to consider are the following:

  • A Linux executable consists in a main binary, and DLL (shared objets) loaded dynamically. In traditional Linux, the main binary is at a fixed address chosen at link time, while DLL are position-independent code.

  • DLL in Linux can be "moved around" inexpensively. Indeed, when a DLL is mapped into an address space, all references to elements of that DLL (from the DLL itself, and from other DLL and the main binary as well) must be adjusted to point to that DLL. This means that there MUST be some modification of some bytes in the main binary and some DLL, depending on how the address space is organized for that specific executable instance. The catch, though, is that any modified page (4 or 8 kB, depending on architecture), is no longer shareable (in physical RAM) with other instances for other executables concurrently executed. This tends to void the RAM usage advantage of DLL (historically, that was the reason for using DLL, but nowadays the ease of software upgrade is arguably a stronger reason).

    In Linux (and other OS, in particular all those using ELF), the way out of this issue is to compile DLL "specially" as Position-Independent Code: in practice, the binary code thus arranges for grouping all (or most) references which must be adjusted into the same pages, using an indirection table (the "GOT", aka "Global Offsets Table") which is dynamically located by its position relative to the invoking code. This means that most DLL pages will remain unaltered when the DLL is loaded, and thus will be shared with any other instance of that same DLL in the address space of another executable.

    This contrasts with the way things are done in Windows, where the OS tries to load the same DLL at the same address for all process which use it, so that the references may be modified "in place" and still be shared, because all the process will see the same DLL at the same address. (More than 20 years ago, at the times of "libc4", DLL were not "moveable" in Linux and were always loaded at a fixed emplacement, for all process.)

  • The main binary is supposed to be linked at a fixed address; however, a mechanism similar to PIC DLL has been also implemented for main binaries: it is called PIE. I explained it at length there. Since PIE can break some non-C applications, Linux distributions which use it tend to reserve it to some applications which are deemed "vulnerable" (i.e. applications which have some network activity).

  • Linux also features VDSO, which can be thought of as a DLL provided by the kernel, instead of being mapped from a file. The kernel's support for ASLR will randomize the VDSO address on a per-process basis.

  • Any randomization requires space, and thus may imply address space fragmentation issues (allocation of big blocks may fail because the free space is split into several smaller holes). This is especially true for 32-bit Linux variants; on 64-bit architectures, the address space is still sufficiently big (compared to physical RAM size) to avoid problems. Therefore, ASLR may have to be disabled for some applications, e.g. photo/video editing. This can be done on a per-process basis with setarch.

From these, we get the following:

  • When ASLR is applied in Linux, it usually is applied on a per-process basis. If you start the same process twice, then the DLL will be loaded at distinct addresses (and the main binary too, if it was compiled for PIE). This is easily witnessed by running the executable within gdb, and, indeed, when you are tracking bugs, the first thing you do is to disable ASLR.

  • Prelink, if applied, prevents ASLR (at least for DLL and binary; heap and stack positions may still be randomized). Prelink may counterbalance this by doing randomization itself, but, by construction, this will be fixed for each executable: a pre-linked executable will find its binary and DLL at the same places, always; however, other pre-linked executables on the same machine will see the same DLL at different addresses, and this is machine-specific, so other machines will obtain address spaces laid out differently. It is also recommended to do a new prelink regularly (e.g. on a weekly basis), which rerandomizes things (and you will get it for evey DLL upgrade, too).

  • Stack and heap are also randomized. The initial stack position, as well as the starting point for the heap (for brk() system calls), are chosen by the kernel upon process start, and each execution puts them at a new place (if ASLR is enabled). Also, the memory allocator may rely on mmap() (it will typically do so for large blocks), which is randomized, and this extends to thread stacks, which are dynamically allocated. By default, glibc will use mmap() calls for any block of more than 128 kB, and stacks for new threads default to 1 MB.

  • When a process is forked, the child necessarily obtains the same layout as its parent. However, DLL loaded after the fork may end up at distinct addresses.

Since all of these depend on the kernel version and patches, and configurable options, the answers to your question are thus necessarily distribution-dependent. For instance, you may see how things go in Ubuntu there (basically: ASLR everywhere, PIE for only a specific list of packages, no prelink). The only common point is that ASLR on Linux is never boot-dependent: when randomization is not applied for each process instance, then this is due to prelinking, which is permanent across reboots (but should be re-applied on a regular schedule).

Thomas Pornin
  • 320,799
  • 57
  • 780
  • 949
  • DLL (Dynamically Linked Library) is a Windows library. Linux uses `.so` (Shared Object) for libraries instead. Also vDSO is not really provided by the kernel, but by the libc. The VVAR (used by the vDSO) is provided by the kernel, though. – forest Dec 16 '17 at 03:43
2

The ASLR randomness for aligning the stack and mmap allocations is generated by the kernel's internal get_random_int function for each new process.

get_random_int uses the RDRAND instruction to generate random values, if supported by the CPU. On other CPUs, It uses a PRNG that is initialized once, at boot, from the kernel's non-blocking (/dev/urandom) pool. (This PRNG is optimized to be fast, not to be cryptographically secure.)

The dynamic linker/loader (ld.so, part of glibc) uses mmap to load the executable and any shared libraries.

So, bottom line: the location of the stack and mmap allocations (including the heap and any executables) are random and different for each new process.

CL.
  • 376
  • 2
  • 8
  • Thank you. What's the bottom line for shared libraries? Also random and different for each new process, like the stack and `mmap` allocations? Presumably that depends upon more than just the kernel behavior, but also on the behavior of the dynamic linker/loader (and what it does with the values given it by the kernel). – D.W. May 24 '14 at 18:29
  • It's random and different for each process, yes. A shared library will only exist in _physical_ memory in one location at a time, but each process can still see it at a different location in _virtual_ memory. An attacker who manages to find the offset for a one library for a given process will not know its offset for another process, even if the physical memory address is the same. – forest Dec 19 '17 at 08:14
  • 2
    And FWIW, Linux has changed the implementation of `get_random_int()` since this answer was posted. It now uses a far better PRNG, taking data from `get_random_bytes()`. – forest Dec 19 '17 at 08:15