32

In short: Instead of another question asking about when to use /dev/random instead of /dev/urandom, I present the following scenario, in which I find myself in an application I'm building:

  • A VM or container environment (ie, a fresh install, probably only seconds old when the application is run for the first time)
  • A need for cryptographically secure random bytes to use as keying material for the rest of the life of the installation (months or more)
  • A user story and interface in which blocking (even for minutes, if necessary) is acceptable

I'm wondering: is this the rare but proper use case for a blocking random source (ie, using getrandom with the blocking mode flag)?

Longer form:

Obviously /dev/urandom vs /dev/random is a topic that has led to contentious discussion. For my part, I'm of the mind that /dev/urandom is preferable in nearly all typical use cases - in fact, I have literally never used a blocking random source before.

In this popular and wonderful answer, Thomas Pornin makes the case that the urandom man page is somewhat misleading (agreed) and that, once properly seeded, the urandom pool will not "run out" of entropy in any practical scenario - and this comports with my understanding as well.

However, I think that he slightly oversells urandom by saying that "the only instant where /dev/urandom might imply a security issue due to low entropy is during the first moments of a fresh, automated OS install."

My understanding is that "boot-time entropy hole" for a typical Ubuntu server boot is over a minute long! This is based on research at the University of Michigan by J. Alex Halderman.

Halderman also seems to say that the entropy pool fills on each boot, and not, as Pornin says in his answer, at the very first OS install. Although it's not terribly important for my application, I'm wondering: which is it?

I have read the "Myths about Urandom" post by Thomas Hühn, but I find it unconvincing for several reasons, most pertinent for my application is that the post essentially boils down to "people don't like to be stopped in their ways. They will devise workarounds, concoct bizarre machinations to just get it running." While this is undoubtedly true (and is the reason I've always used /dev/urandom everywhere else, especially for web stuff), there are some applications in which users will tolerate having to wait, especially if they are installing it for the first time.

I am building an application meant to be run locally in a terminal setting and I already have reason to create an expectation that the initial installation process will be a bit involved. I have no qualms about asking the user to wait a bit if it can add even a small amount of robustness against a repeated keypair.

In fact, Halderman says he was able to compute private keys for 105,728 SSH hosts - over 1% of those he scanned - because of weak entropy pools being used to generate the keypair. In this case, it was largely embedded devices which presumably have abysmal sources of entropy and thus a hard time filling their pool.

But - and this is perhaps the heart of my question - in an age when apps are shipped in wholly naive containers, meant to be run as if on a shiny, fresh OS installation only seconds old, aren't we reasonably concerned about this same phenomenon? Don't we need a practical blocking interface? And is that what getrandom is intended to become?

Of course it's possible in many situations to share entropy from the host to the guest. But for the purposes of this question, let's assume that the author has decided not to do that, either because she won't have sufficient control over the particulars of the deployment or because there is no entropy pool available on the host.

Thinking just a bit further down the road: what are the best practices for environments which are as fresh and naive as I've described above, but which run on devices of fairly abysmal prospects for initial entropy generation? I'm thinking small embedded devices with few or no HIDs and which are perhaps air-gapped for the initial installation process.

edit: Update: So it appears that, as of PEP524, Python (in which the app in question is written) uses getrandom when os.urandom is called, and blocks in the event that the entropy pool hasn't gathered at least 128 bits. So as a practical matter, I think I have my answer - just use os.urandom and it will behave like /dev/random only when necessary. I am, however, interested in the over-arching question here (ie, does the era of containerization mean a re-thinking of "always just use urandom" orthodoxy).

jMyles
  • 401
  • 4
  • 12
  • 1
    What about exporting `/dev/urandom` from host to the container? – ThoriumBR May 18 '18 at 20:27
  • Yeah, in general, I think this is imperative. In my particular case, I don't think I'll have control of the variety of ways the app is containerized, so I very much prefer a solution that lives inside it. – jMyles May 18 '18 at 20:29
  • 1
    Wrt the entropy pool filling in reach boot: true. And filling on OS install: also true. The kennel had a fresh entropy pool to fill when it boots. Distributions set things up to fill that pool at boot time with entropy gathered from the previous boot, if rewritable storage is available. – Gilles 'SO- stop being evil' May 18 '18 at 22:03
  • that research is 5 years old, a security lifetime. It would be strange if the OS didn't have an entropy file it uses at boot, like Fortuna demands and Windows uses. that's the diff between first-install and each-boot. – dandavis May 20 '18 at 20:48
  • 1
    Sure, and it looks like Ubuntu server indeed does that now - but what about first boot? So many more apps are now designed to run as the first order of business on first boot (ie, in a VM, in Docker, on AWS, etc), and it's pretty reasonable to want guest-contained logic for adequate entropy collection, no? – jMyles May 20 '18 at 20:53
  • Is there any extra information you need that hasn't gotten attention, since you opened the bounty? – forest May 24 '18 at 02:17
  • Good question - I really like your answer and I don't feel that's it's *per se* lacking anything, but I'm also interested in hearing more about best practices for the environment I've described, especially if the scenario moves toward devices with even less chance to generate entropy (think initially air gapped devices with few or not HIDs, maybe embedded, distributed tech). I'll comment more on your question with a follow-up. – jMyles May 24 '18 at 04:09
  • @jMyles I've edited my question to account for that. The gist is that you really need to use a hardware RNG (many embedded microprocessors have one), and everything else is just "better than nothing". – forest May 25 '18 at 00:48
  • Are you sure a container isn't sharing (seeded) urandom with the host system? – allo May 28 '18 at 12:50
  • Sure, that's a very typical scenario. But it's not convenient in every case; I'm asking about cases where, for whatever reason, the guest won't be sharing entropy with the host. Or, where there is no host at all (like an IoT situation). – jMyles May 28 '18 at 16:31
  • Also: I've made a small update to the question to respond to this critique. – jMyles May 28 '18 at 18:36

2 Answers2

28

I wrote an answer which describes in detail how getrandom() blocks waiting for initial entropy.

However, I think that he slightly oversells urandom by saying that "the only instant where /dev/urandom might imply a security issue due to low entropy is during the first moments of a fresh, automated OS install."

Your worries are well-founded. I have an open question about that very thing and its implications. The issue is that the persistent random seed takes quite some time to move from the input pool to the output pool (the blocking pool and the CRNG). This issue means that /dev/urandom will output potentially predictable values for a few minutes after boot. The solution is, as you say, to use either the blocking /dev/random, or to use getrandom() set to block.

In fact, it is not uncommon to see lines like this in the kernel's log at early boot:

random: sn: uninitialized urandom read (4 bytes read, 7 bits of entropy available)
random: sn: uninitialized urandom read (4 bytes read, 15 bits of entropy available)
random: sn: uninitialized urandom read (4 bytes read, 16 bits of entropy available)
random: sn: uninitialized urandom read (4 bytes read, 16 bits of entropy available)
random: sn: uninitialized urandom read (4 bytes read, 20 bits of entropy available)

All of these are instances when the non-blocking pool was accessed even before enough entropy has been collected. The problem is that the amount of entropy is just too low to be sufficiently cryptographically secure at this point. There should be 232 possible 4 byte values, however with only 7 bits of entropy available, that means there are only 27, or 128, different possibilities.

Halderman also seems to say that the entropy pool fills on each boot, and not, as Pornin says in his answer, at the very first OS install. Although it's not terribly important for my application, I'm wondering: which is it?

It's actually a matter of semantics. The actual entropy pool (the page of memory kept in the kernel that contains random values) is filled on each boot by the persistent entropy seed and by environmental noise. However, the entropy seed itself is a file that is created at install time and is updated with new random values each time the system shuts down. I imagine Pornin is considering the random seed to be a part of the entropy pool (as in, a part of the general entropy-distributing and collecting system), whereas Halderman considers it to be separate (because the entropy pool is technically a page of memory, nothing more). The truth is that the entropy seed is fed into the entropy pool at each boot, but it can take a few minutes to actually affect the pool.

A summary of the three source of randomness:

  1. /dev/random - The blocking character device decrements an "entropy count" each time it is read (despite entropy not actually being depleted). However, it also blocks until sufficient entropy has been collected at boot, making it safe to use early on. Note that modern kernels have re-designed this character device. Now, it will block only until sufficient entropy has been collected once, then will remain unblocking, identical to /dev/urandom.

  2. /dev/urandom - The non-blocking character device will output random data whenever anyone reads from it. Once sufficient entropy has been collected, it will output a virtually unlimited stream indistinguishable from random data. Unfortunately, for compatibility reasons, it is readable even early on in boot before enough one-time entropy has been collected.

  3. getrandom() - A syscall that will output random data as long as the entropy pool has properly initialized with the minimum amount of entropy required. It defaults to reading from the non-blocking pool. If given the GRND_NONBLOCK flag, it will return an error if there is not enough entropy. If given the GRND_RANDOM flag, it will behave identically to /dev/random, simply blocking until there is entropy available.

I suggest you use the third option, the getrandom() syscall. This will allow a process to read cryptographically-secure random data at high speeds, and will only block early on in boot when not enough entropy has been gathered. If Python's os.urandom() function acts as a wrapper to this syscall as you say, then it should be fine to use. It looks like there was actually much discussion on whether or not that should be the case, ending up with it blocking until enough entropy is available.

Thinking just a bit further down the road: what are the best practices for environments which are as fresh and naive as I've described above, but which run on devices of fairly abysmal prospects for initial entropy generation?

This is a common situation, and there are a few ways to deal with it:

  • Ensure you block at early boot, for example by using /dev/random or getrandom().

  • Keep a persistent random seed, if possible (i.e. if you can write to storage at each boot).

  • Most importantly, use a hardware RNG. This is the #1 most effective measure.

Using a hardware random number generator is very important. The Linux kernel will initialize its entropy pool with any supported HWRNG interface if one exists, completely eliminating the boot entropy hole. Many embedded devices have their own randomness generators.

This is especially important for many embedded devices, since they may not have a high-resolution timer that is required for the kernel to securely generate entropy from environmental noise. Some versions of MIPS processors, for example, have no cycle counter.

How and why do you suggest using urandom to seed a (I guess userland?) CSPRNG? How does this beat getrandom?

The non-blocking randomness device is not designed for high performance. Until recently, the device was obscenely slow due to using SHA-1 for randomness rather than a stream cipher as it does now. Using a kernel interface for randomness can be less efficient than a local, userspace CSPRNG because each call to the kernel requires an expensive context switch. The kernel has been designed to account for applications that want to draw heavily from it, but the comments in the source code make it clear that they do not see this as the right thing to do:

/*
 * Hack to deal with crazy userspace progams when they are all trying
 * to access /dev/urandom in parallel.  The programs are almost
 * certainly doing something terribly wrong, but we'll work around
 * their brain damage.
 */

Popular crypto libraries such as OpenSSL support generating random data. They can be seeded once or reseeded occasionally, and are able to benefit more from parallelization. It additionally makes it possible to write portable code that does not rely on the behavior of any particular operating system or version of operating system.

If you do not need huge amounts of randomness, it is completely fine to use the kernel's interface. If you are developing a crypto application that will need a lot of randomness throughout its lifetime, you may want to use a library like OpenSSL to deal with that for you.

forest
  • 64,616
  • 20
  • 206
  • 257
11

There are three states the system can be in:

  1. Hasn't collected enough entropy to safely initialize a CPRNG.
  2. Has collected enough entropy to safely initialize a CPRNG, and:

    2a. Has given out more entropy than it's collected.

    2b. Has given out less entropy than it's collected.

Historically, people thought the distinction between (2a) and (2b) was important. This caused two problems. First, it's wrong – the distinction is meaningless for a properly designed CPRNG. And second, the emphasis on the (2a)-vs-(2b) distinction caused people to miss the distinction between (1) and (2), which actually is really important. People just sort of collapsed (1) into being a special case of (2a).

What you really want is something that blocks in state (1), and doesn't block in states (2a) or (2b).

Unfortunately, in the old days, the confusion between (1) and (2a) meant that this wasn't an option. Your only two options were /dev/random, which blocked in cases (1) and (2a), and /dev/urandom, which never blocked. But state (1) almost never happens – and doesn't happen at all in well-configured systems, see below – then /dev/urandom is better for almost all systems, almost all the time. That's where all those blog posts about "always use urandom" came from – they were trying to convince people to stop making a meaningless and harmful distinction between the (2a) and (2b) states.

But, yeah, neither of these is what you actually want. Thus, the newer getrandom syscall, which by default blocks in state (1), and doesn't block in states (2a) or (2b). So on modern Linux, the orthodoxy should be updated to: always use getrandom with default settings.

Extra wrinkles:

  • getrandom also supports a non-default mode where it acts like /dev/random, which can be requested via the GRND_RANDOM flag. AFAIK this flag is never actually useful, for all the same reasons those old blog posts described. Don't use it.

  • getrandom also has some extra bonus benefits over /dev/urandom: it works regardless of your filesystem layout, and doesn't require opening a file descriptor, both of which are problematic for generic libraries that want to make minimal assumptions about the environment they'll be used in. This doesn't affect cryptographic security, but it's nice operationally.

  • A well-configured system will always have entropy available, even in early boot (i.e., you should really never get into state (1), ever). There are a lot of ways to manage this: save some entropy from the previous boot to use on the next one. Install a hardware RNG. Docker containers use the host's kernel, and thus get access to its entropy pool. High-quality virtualization setups have ways to let the guest system fetch entropy from the host system via hypervisor interfaces (e.g. search for "virtio rng"). But of course, not all systems are well-configured. If you have a poorly-configured system, you should see if you can make it well-configured instead. In principle this should be cheap in easy, but in reality people don't prioritize security so... it might require doing things like switching cloud providers, or switching to a different embedded platform. And unfortunately, you may find that this is more expensive than you (or your boss) are willing to pay, so you're stuck dealing with a poorly-configured system. My sympathies if so.

  • As @forest notes, if you need a lot of CPRNG values, then if you're very careful you can speed this up by running your own CPRNG in userspace, while using getrandom for (re)seeding. This is very much an "experts only" thing though, just like any situation where you find yourself implementing your own crypto primitives. You should only do it if you've measured and found that using getrandom directly is too slow for your needs and you have significant cryptographic expertise. It's very easy to screw up a CPRNG implementation in such a way that your security is totally broken, but the output still "looks" random so you don't notice.

  • Hey man - thanks for stepping in! Great answer. I don't think it's quite right, though, to say that "A well-configured system will always have entropy available, even in early boot." What about small embedded devices that have no host, no HIDs, and initially no network connection? Do you mean to say that such a device can't be "well configured" unless it has a hardware RNG? Is it then your contention that devices like the WiPy aren't capable of being "well-configured" without additional hardware? – jMyles May 29 '18 at 23:31
  • 2
    Well, yeah, if you need entropy and you're using a system that can't give you entropy, then something has obviously gone wrong :-). Boot-time entropy *should* be cheap – all you need is a few bytes of persistent storage, or a small hardware module, included with most SoCs – but you're right, people don't prioritize this, so it might be difficult or expensive in practice. I rewrote that section a bit to hopefully be clearer about the distinction between the way things ought to be versus the way things are. – Nathaniel J. Smith May 30 '18 at 00:34
  • Well yeah, but all other things being equal (ie, in the case of a fresh, first-time boot from a cloned SD card or whatever), then isn't *time* also a viable solution? If you're making a ton of IoT devices and you can spare 10 minutes before generating material that will be used for many months, then do that, no? – jMyles May 30 '18 at 00:39
  • By "time" you mean, have your key generation block in `getrandom` until enough entropy is available? Sure, this is a better option than the alternatives, but it's not a panacea; in fact it's kind of a hack. If you're not careful, your boot scripts might block waiting for entropy, so you never get far enough into boot to start gathering entropy... this has happened. If you want a reliable system someone has to figure out where the entropy is coming from, and the simplest way by far is to ensure it's always available at boot. – Nathaniel J. Smith May 30 '18 at 00:53
  • Yeah, I agree that it is a hack in the boot scenario. But at the application layer, as in the story I've laid out in the question, isn't it a good alternative? In other words, if your user story is such that waiting is acceptable, isn't that a valid use of getrandom? – jMyles May 30 '18 at 00:54
  • 1
    (Or you could just generate weak keys -- that's a terrible option, but it's still popular, see https://factorable.net/. If you set up the system properly in the first place, then no-one will be tempted to take this approach. Fortunately, hardware RNGs are very common these days, e.g. some searching suggests that your WiPy has one built into its SoC :-).) – Nathaniel J. Smith May 30 '18 at 00:55
  • And yes, what's I'm saying is that if entropy consumers encounter state (1), they should block (or maybe error out I guess), neither of which is great but are the best options available... AND system builders should make sure that state (1) never happens. – Nathaniel J. Smith May 30 '18 at 01:00
  • 1
    @NathanielJ.Smith As I mentioned in my own answer, persistent storage for a random seed does _not_ help with entropy early at boot. It does not help for several minutes after the seed is written to the pool by the relevant init script. – forest May 31 '18 at 01:33