28

Below are graphs with the value of /proc/sys/kernel/random/entropy_avail on a Raspberry Pi. This answer which might not be correct describes it as:

/proc/sys/kernel/random/entropy_avail simply gives you the number of bits that can currently be read from /dev/random. Attempts to read more than that will block until more entropy becomes available.

The pattern always comes to the same "stable" saw pattern with ~130 bit decrease every single minute. If entropy grows too much, something is "eating" it to return to the 700-800 range. If I reboot the device, the entropy is still being eaten each minute, but in a smaller chunks thus enabling it to grow again to 700-800 range.

How shall I interpret the graphs? What is happening?

My feeling is that if there was just a process using a random number generator, the entropy_avail once thrown out of balance (by using the device hardware) should either grow infinitely or decrease to the level of 200, when /dev/random would stop supplying the values.

Also if any of the monitoring methods (see checks below) influenced the entropy, it should rather decrease entropy every second, rather than let it grow and drop suddenly at one minute intervals.

(if I leave the machine idle, the stable "saw" pattern continues for days, I took the screenshots in a shorter period of time)


The graphs

  • The machine is idle for a long time:

    enter image description here

  • At around 19:14:45 another machine accessed apt-cacher on the Pi - entropy grew (I guess from network usage). After that at 19:16:30 the drop to "usual levels" was larger than usual (it is also repeatable - if entropy_avail grows too large, it drops faster):

    enter image description here

  • I reboot the machine, the entropy grows until it reaches the "usual" level:

    enter image description here

    enter image description here

  • Once again it reaches an idle state:

    enter image description here

  • After another reboot the point in time of the entropy decrease changes, but it still occurs every single minute:

    enter image description here


Checks

  • I stopped netdata (monitoring program) and checked with watch -n1 cat /proc/sys/kernel/random/entropy_avail. The value of entropy_avail grows to ~800 and drops to ~680 at regular one minute intervals.

  • Per advice "trace all processes for access to /dev/random and /dev/urandom" I checked with inotifywait (idea from an answer to a similar question) on Debian VM and there is no access to either /dev/random or /dev/urandom at the moment entropy_avail drops (of course checking manually logs the event).

  • I used the entropy-watcher to check the entropy as advised against using watch. The results are still consistent with a steady increase and a sharp drop every single minute:

    833 (-62)
    836 (+3)
    838 (+2)
    840 (+2)
    842 (+2)
    844 (+2)
    846 (+2)
    848 (+2)
    850 (+2)
    852 (+2)
    854 (+2)
    856 (+2)
    858 (+2)
    860 (+2)
    862 (+2)
    864 (+2)
    866 (+2)
    868 (+2)
    871 (+3)
    873 (+2)
    811 (-62)
    

Two questions on Unix StackExchange describing the same phenomenon (found later):

techraf
  • 9,141
  • 11
  • 44
  • 62
  • Do you have any java program running? This kind of graph also occurs when there is lot of garbage collection work being done – Limit Jun 13 '16 at 11:25
  • No Java. At this moment it a plain Jessie Lite serving Samba (idle), `apt-cacher` (as seen on graph) and of course [netdata](https://github.com/firehol/netdata) which was used for graphs. – techraf Jun 13 '16 at 11:27
  • 1
    how would a java program not doing any cryptography influence the entropy pool? also the 'pool' size is an estimation on how 'good' the entropy is ... in other words after a few milliseconds of runtime its complete and utter nonsense. you should use the /dev/urandom and not be worried about the 'entropy' the pools these days or constructed in such a way that this number has becomee meaningless except for people like 'CA's' where the generate a lot of keys. – LvB Jun 13 '16 at 11:29
  • 4
    Please read http://www.2uo.de/myths-about-urandom/, entropy disappearing into thin air is a myth that got to die. A few people working on the Linux kernel cannot be convinced, but some other operating systems (FreeBSD, OpenBSD) stopped making a difference between /dev/random and /dev/urandom a long time ago... Don't lose any sleep over that graph, use /dev/urandom, be happy everafter – Bruno Rohée Jun 13 '16 at 11:29
  • And if you really want to know, trace all processes for access to /dev/random and /dev/urandom, the culprit shall appear quickly. – Bruno Rohée Jun 13 '16 at 11:31
  • 1
    Please notice that I asked how shall I interpret the included graphs, not what is the proper way to generate random numbers. It's perfectly fine if the explanation mentioned that they had no practical use, but I'd like to know understand what I am seeing here in the first place. – techraf Jun 13 '16 at 11:33
  • @BrunoRohée Not necessarily, random numbers (and hence Linux's “entropy” count) are also consumed inside the kernel, e.g. for ASLR, for networking (e.g. TCP sequence members), for disk encryption, etc. – Gilles 'SO- stop being evil' Jun 13 '16 at 11:35
  • There's a drop every minute sharp, this doesn't look like an in-kernel process. When are the measurements made? Are they always at the same number of seconds past the minute? If they were shortly after a whole minute I'd blame a cron job that runs once per minute, but if they're around the :30 mark that doesn't seem likely. – Gilles 'SO- stop being evil' Jun 13 '16 at 11:38
  • @Gilles You're right, and I also forgot the getrandom(2) syscall that one must also look for in tracing the processus in order to find the culprit. – Bruno Rohée Jun 13 '16 at 11:39
  • 1
    @Gilles They seem to be at fixed number of seconds which changes on reboot. Now it's running around every minute +35 seconds, I'll include the graph. – techraf Jun 13 '16 at 11:41
  • Have you ruled out whether your monitoring program, which creates the graphs, impacts entropy whenever it takes a reading? Always good to check something like that. – gowenfawr Jun 13 '16 at 12:10
  • 1
    @gowenfawr Yes, I stopped it and ran `watch -n1 cat /proc/sys/kernel/random/entropy_avail`ーsame resultsーup to ~820 down to ~680. I can't run `auditd` on Raspbian. Wondering how else I can monitor access to `/dev/random` / `/dev/urandom`. – techraf Jun 13 '16 at 12:42
  • @techraf The cookie monster is eating your entropy. (sorry, quiet day in the office !) – Little Code Jun 13 '16 at 13:46

1 Answers1

13

First, the claim that "/proc/sys/kernel/random/entropy_avail simply gives you the number of bits that can currently be read from /dev/random" is false.

The entropy_avail field reads the input_pool.entropy_count, the "output" pool refers to the pool used for urandom (non-blocking pool) and random (blocking pool).


As mentioned in this answer, spawning new processes consumes entropy for things like ASLR. The watch program spawns a new process for every invocation, perhaps the monitoring tool does the same (possibly via one of the other monitoring sources that have to invoke an external program to obtain the status?).

For monitoring the entropy pool without draining it you can try the entropy-watcher program (see linked answer).

Watching the entropy-watcher numbers closely it seems that you lose about 64 bits of entropy at intervals. Based on the analysis in the other answer, this seems to be the result of moving entropy to an "output pool" to avoid wasting it. This is observed on Linux v4.6, future implementations may be different.

Based on the source code (drivers/char/random.c in v4.6), I can see that reading the output pools (/dev/{u,}random or get_random_bytes()) invokes extract_entropy{,_user} which calls xfer_secondary_pool and account. The blocking pool has the property limit set (r->limit == 1) which affects both functions:

  • For account() it will return no data from the blocking pool if its entropy is too low. For the non-blocking output pool, the remaining entropy will be consumed but data is still returned.
  • xfer_secondary_pool() ensures that enough entropy is available in the output pool. If the blocking output pool has insufficient entropy, it will take some from the input pool (when possible).
  • xfer_secondary_pool() for the non-blocking output pool behaves specially according to the /proc/sys/kernel/random/urandom_min_reseed_secs parameter. If this value is non-zero, entropy is only taken from the input pool if at least urandom_min_reseed_secs seconds have elapsed since the last transfer. By default this value is set to 60 seconds.

The last point finally explains why you see an entropy drain in the input pool every 60 seconds. If some consumer requests random bytes from the non-blocking output pool (TCP sequence numbers, ASLR, /dev/urandom, getrandom(), ...), then 128 bits will be consumed from the input pool to reseed the non-blocking output pool.

Lekensteyn
  • 5,898
  • 5
  • 37
  • 62
  • @techraf See update, besides losing entropy via new processes, geturandom() and `/dev/{,u}random` reads, apparently the pool is also drained when it is somehow too full. – Lekensteyn Jun 20 '16 at 18:43
  • 1
    @techraf Your first link has a false claim, the `entropy_avail` field reads the [`input_pool.entropy_count`](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/char/random.c?h=v4.7-rc4#n1703), the "output" pool refers to the pool used for urandom (non-blocking pool) and random (blocking pool). – Lekensteyn Jun 21 '16 at 11:40
  • @techraf There should be a difference between `cat` and `entropy-watcher`, the former is executed periodically and it should have the same constant error (deviation) in reading. Btw, according to the `perf script` output, exactly 16 bytes (128 bits) are consumed by the ELF loader for `execve`. Also to my surprise executing `watch -n1 cat ...` does not result in a drain anymore on `Linux 4.6.2-1-ARCH`. – Lekensteyn Jun 21 '16 at 12:47
  • I don't understand how you were able to verify what `watch -n1 cat ...` does to the entropy. Given that `entropy_avail` shows info on input pull which is not touched directly by any process, the whole idea of monitoring the entropy through your `entropy_watcher` gets somewhat... how to put it passive (?), indirect (?). Or am I missing something? – techraf Jun 21 '16 at 23:34
  • @techraf `cat ...` referred to `cat /proc/sys/kernel/random/entropy_avail`. I have updated the answer with new findings, entropy is not always taken from the input pool to update the non-blocking output pool. You could say that it is "directly" updated, except that it only happens when at least X seconds has elapsed. – Lekensteyn Jun 22 '16 at 10:56
  • Sorry, I integrated your comment into the answer, because it was the most important observation. Having done that, I can happily upvote and accept. Now, I will digest the answer slowly. Thank you. – techraf Jun 22 '16 at 11:14
  • 1
    @techraf That is OK, be sure though to also study the bottom half of the answer too since that goes into more detail on why the entropy_avail value actually decreases every 60 seconds. Thank you for your question and critical comments too! – Lekensteyn Jun 22 '16 at 12:30
  • 1
    This has nothing to do with ASLR. ASLR uses the `randomize_range()` function, which itself uses `get_random_long()`, which does not touch the entropy estimate at all. More likely it's related to stack cookies, things like `AT_RANDOM` in the auxiliary vector. From `getauxval(3)`, this value stores a pointer to 16 bytes (128 bits) of random data. – forest Dec 09 '17 at 08:47