Is it appropriate to use haveged as a source of entropy on virtual machines?

Question

While looking for solutions to entropy pool depletion on virtual machines, I came across an interesting project called haveged, which is based on the HAVEGE algorithm (HArdware Volatile Entropy Gathering and Expansion). It makes a pretty fantastic claim.

HAVEGE is a random number generator that exploits the modifications of the internal CPU hardware states (caches, branch predictors, TLBs) as a source of uncertainty. During an initialization phase, the hardware clock cycle counter of the processor is used to gather part of this entropy: tens of thousands of unpredictable bits can be gathered per operating system call in average.

If this really produces nearly unlimited high-quality entropy on headless virtual machines, it should be included in every server distribution by default! And yet, some people have raised concerns.

"At its heart, [HAVEGE] uses timing information based on the processor's high resolution timer (the RDTSC instruction). This instruction can be virtualized, and some virtual machine hosts have chosen to disable this instruction, returning 0s or predictable results."
(Source: PolarSSL Security Advisory 2011-02 on polarssl.org).

And furthermore, popular NIST and ENT tests will sometimes give haveged a PASS even when it's intentionally mis-configured, and not actually producing random numbers!

I replaced the “HARDTICKS” macro in HAVEGE with the constant 0 (zero) rather than reading the time stamp counter of the processor. This immediately failed the randomness test. However, when I used the constant 1 (one) instead, the ent test passed. And even nist almost passed with only a single missed test out of the 426 tests executed. (Source: Evaluating HAVEGE Randomness on engbloms.se).

So, which virtualization platforms/hypervisors are safe to use with haveged in a virtual machine?
And is there a generally accepted best practice way to test whether a source of randomness is producing sufficiently high quality numbers?

haveged is included in most repos, though some versions may be quite old. Version 1.5 introduced AIS31 testing. — gwuertz, Apr 18 '13 at 21:37
See also [Quality of randomness from HAVEGE algorithm](http://crypto.stackexchange.com/questions/8083/quality-of-randomness-from-havege-algorithm) — Gilles 'SO- stop being evil', Apr 29 '13 at 12:45

score 43 · Accepted Answer · answered Apr 18 '13 at 16:45

(Caveat: I certainly don't claim that HAVEGE lives up to its claims. I have not checked their theory or implementation.)

To get randomness, HAVEGE and similar systems feed on "physical events", and in particular on the timing of physical events. Such events include occurrences of hardware interrupts (which, in turn, gathers data about key strokes, mouse movements, incoming ethernet packets, time for a hard disk to complete a write request...). HAVEGE also claims to feed on all the types of cache misses which occur in a CPU (L1 cache, L2 cache, TLB, branch prediction...); the behaviour of these elements depends on what the CPU has been doing in the previous few thousands clock cycles, so there is potential for some "randomness". This hinges on the possibility to measure current time with great precision (not necessarily accuracy), which is where the rdtsc instruction comes into play. It returns the current contents of an internal counter which is incremented at each clock cycle, so it offers sub-nanosecond precision.

For a virtual machine system, there are three choices with regards to this instruction:

Let the instruction go to the hardware directly.
Trap the instruction and emulate it.
Disable the instruction altogether.

If the VM manager chooses the first solution, then rdtsc has all the needed precision, and should work as well as if it was on a physical machine, for the purpose of gathering entropy from hardware events. However, since this is a virtual machine, it is an application on the host system; it does not get the CPU all the time. From the point of view of the guest operating system using rdtsc, this looks as if its CPU was "stolen" occasionally: two successive rdtsc instructions, nominally separated by a single clock cycles, may report an increase of the counter by several millions. In short words, when rdtsc is simply applied on the hardware, then the guest OS can use it to detect the presence of an hypervisor.

The second solution is meant to make the emulation more "perfect" by maintaining a virtual per-VM cycle counter, which keeps track of the cycles really allocated to that VM. The upside is that rdtsc, from the point of view of the guest, will no longer exhibit the "stolen cycles" effect. The downside is that this emulation is performed through triggering and trapping a CPU exception, raising the cost of the rdtsc opcode from a few dozen clock cycles (it depends on the CPU brand; some execute rdtsc in less than 10 cycles, other use 60 or 70 cycles) to more than one thousand of cycles. If the guest tries to do a lot of rdtsc (as HAVEGE will be prone to do), then it will slow down to a crawl. Moreover, the exception handling code will disrupt the measure; instead of measuring the hardware event timing, the code will measure the execution time of the exception handler, which can conceivably lower the quality of the extracted randomness.

The third solution (disabling rdtsc) will simply prevent HAVEGE from returning good randomness. Since it internally uses a PRNG, the output may still fool statistical analysis tools, because there is a huge difference between "looking random" and "being unpredictable" (statistical analysis tools follow the "look random" path, but cryptographic security relies on unpredictability).

The VirtualBox manual claims that VirtualBox, by default, follows the first method (rdtsc is unconditionally allowed and applied on the hardware directly), but may be configured to apply the second solution (which you don't want, in this case).

To test what your VM does, you can try this small program (compile with gcc -W -Wall -O on Linux; the -O is important):

#include <stdio.h>

#if defined(__i386__)

static __inline__ unsigned long long rdtsc(void)
{
        unsigned long long int x;

        __asm__ __volatile__ (".byte 0x0f, 0x31" : "=A" (x));
        return x;
}

#elif defined(__x86_64__)

static __inline__ unsigned long long rdtsc(void)
{
        unsigned hi, lo;

        __asm__ __volatile__ ("rdtsc" : "=a"(lo), "=d"(hi));
        return ( (unsigned long long)lo)|( ((unsigned long long)hi)<<32 );
}

#endif

int
main(void)
{
        long i;
        unsigned long long d;

        d = 0;
        for (i = 0; i < 1000000; i ++) {
                unsigned long long b, e;

                b = rdtsc();
                e = rdtsc();
                d += e - b;
        }
        printf("average : %.3f\n", (double)d / 1000000.0);
        return 0;
}

On a non-virtual machine, with the "true" rdtsc, this shall report a value between 10 and 100, depending on the CPU brand. If the reported value is 0, or if the program crashes, then rdtsc is disabled. If the value is in the thousands, then rdtsc is emulated, which means that the entropy gathering may not work as well as expected.

Note that even getting a value between 10 and 100 is not a guarantee that rdtsc is not emulated, because the VM manager, while maintaining its virtual counter, may subtract from it the expected time needed for execution of the exception handler. Ultimately, you really need to have a good look at the manual and configuration of your VM manager.

Of course, the whole premise of HAVEGE is questionable. For any practical security, you need a few "real random" bits, no more than 200, which you use as seed in a cryptographically secure PRNG. The PRNG will produce gigabytes of pseudo-alea indistinguishable from true randomness, and that's good enough for all practical purposes.

Insisting on going back to the hardware for every bit looks like yet another outbreak of that flawed idea which sees entropy as a kind of gasoline, which you burn up when you look at it.

Recent versions of haveged implement run-time testing of the output using an adaptation of the AIS31, the test suite used by the German Common Criteria body to certify smart cards, see http://www.issihosts.com/haveged/ais31.html With a constant value in place of the high resolution timer, haveged may behave like a long sequence prng (or not depending on the details of the bit pattern). If the output passes the continuous tests, how does that matter? If anything, that argues that the particulars of the timer source are not as critical as one might think. — gwuertz, Apr 18 '13 at 21:37
@gwuertz: if the output passes the continuous tests even when its entropy source has been fixed (no measure at all !), then this means that the tests are not adequate to test for cryptographic strength -- as is expected, since such tests are for statistical patterns, not for unpredictability. If you talk a strong crypto-grade PRNG and seed it with a fixed 0, you will also pass all such tests with success, and your security will nonetheless be nil. — Tom Leek, Apr 18 '13 at 23:00
I may also add that I have some reservations about trusting people for implementing a PRNG, when they show that their framework for testing their results is statistical tests, which, by definition, are terrible for cryptographic usages. This does not bode well. — Tom Leek, Apr 18 '13 at 23:02
@TomLeek is there an alternative? I always thought there is no proof for unpredictability itself. — aef, May 22 '14 at 18:11
Cryptographers use detailed and thorough peer review to get some assurance that their design is not obviously bad. Now that's not really great. My point, though, is that statistical tests are terrible at evaluating security. Such tests can only detect extremely poor and weak PRNG. It is easy to make a PRNG which will pass all such tests (hey, even a LFSR can do it). From a cryptographer's point of view, it is not even worth reporting. To make an analogy: it looks like a car manufacturer who publishes big ads stating that his cars are great because he has ve-ri-fied that they _have wheels_. — Tom Leek, May 22 '14 at 22:19
The instruction may run with sub-nanosecond precision on its own, but remember that the process HAVEGE is running in will only execute during specific, predictable and deterministic timeslices. — forest, Oct 14 '18 at 03:06
Regarding the premise being questionable, remember that that's just the `haveged` program which has odd behavior, not the library itself. The library is actually pretty popular in some embedded devices, e.g. mbedtls uses libhavege if a system entropy source is not present, and it only generates the amount of random data that is requested from it (e.g. for key operations). — forest, Oct 14 '18 at 03:09

score 12 · Answer 2 · edited Apr 19 '13 at 19:52

On the polarssl advisory: A detailed technical response can be found in the most recent debian source:

http://bazaar.launchpad.net/~ubuntu-branches/debian/experimental/haveged/experimental/revision/12?start_revid=12#debian/README.Debian

The executive summary is: polarssl != haveged != HAVEGE

On the embloms.se emulator experiments:

The haveged test suite, NIST and ent, verify that a source build has produced a functional RNG. Run-time testing is required to verify haveged operation in a virtual environment. That is a fairly recent addition to haveged.

On statistical testing:

Suppose you have a hardware RNG, a TRNG, how do you know it isn't broken? Hardware breaks. The German standards body has a spec that deals with that very problem, AIS31. That approach is adopted by haveged. A (biased) discussion of standards for RNG as they apply to haveged can be found at:

http://www.issihosts.com/haveged/ais31.html

The non-predictability of HAVEGE is exactly the same mechanism seen in benchmarking software. It is not due to timer drift, but the aggregation of the asynchronous behaviors in a modern processor. It really does not matter whether the performance variations come from a cache miss or a time slice delivered to another processor. The timer accuracy also does not matter as long as it's 'adequate'. How is that determined? By the output. By design (or perhaps over design) /dev/random is immune to poor input data. The only way to subvert the design is to lie about the entropy added to the entropy pool. The most recent versions of haveged perform a run-time entropy estimate on the output generate to insure the output is consistent with an ideal TRNG.

Executive summary: haveged output is indistinguishable from a TRNG according to tests used by the German standards body to certify TRNG. If that is not the case, haveged will shut down.

score 9 · Answer 3 · edited Mar 17 '17 at 13:14

(Updated, with thanks to gwuertz the current author/maintainer of haveged, I missed the separation between HAVEGE and haveged.)

haveged is a distinct implementation of the HAVEGE method for generating random numbers, it is current, maintained and documented here: http://www.issihosts.com/haveged/ (and no longer uses libhavege directly).

HAVEGE isn't very current (2006), though someone has patched it more recently (2009) for speed and correctness. I would be cautious because it does not document its method, does not mention virtual machines, and as noted, relies (heavily) on RDTSC (or platform equivalent). (Also the source code makes me shudder, but that's quite subjective.)

I will argue that the host VM should not be unintentionally leaking any state into the guests, so it should not be considered a good source of entropy when using this method.

A better approach on Linux is rng-tools with the virtio-rng paravirtualised rng driver (i.e. allow a guest to access entropy gathered by the host, eliminating many potential issues about a guests view of "random" events), or you may find Entropy Broker more useful. On recent Intel chips you can also expose the RDRAND instruction to guests, side-stepping the issue.

This article (from hpa's talk at LinuxCon Europe 2012) is useful reading: http://lwn.net/Articles/525459/ , it also discusses HAVEGE/haveged (though the distinction isn't clear here either).

See the answers to this question for some thoughts on how to determine lack of randomness: What statistics can be used to identify pseudorandom data?

Latest haveged release is Feb 2013. Try freecode.com or the upstream (me) On the old tired polarssl vulerability: http://bazaar.launchpad.net/~ubuntu-branches/debian/experimental/haveged/experimental/revision/12?start_revid=12#debian/README.Debian — gwuertz, Apr 18 '13 at 21:37
I dunno, the source code seems pretty objectively bad. Sometimes `if (cond)`, sometimes `if(cond)`, sometimes `if( cond)`? Sometimes double indentation, sometimes triple, sometimes quad? Sometimes `a == b`, sometimes `a==b`, sometimes `a ==b`? Literally horrifying. It would probably have three typos in every comment (*cough* imlib2 *cough*)... if it had any. — forest, Mar 17 '18 at 05:05

Is it appropriate to use haveged as a source of entropy on virtual machines?

3 Answers3