5

Since a reboot yesterday, one of our virtual servers (Debian Lenny, virtualized with Xen) is constantly running out of entropy, leading to timeouts etc. when trying to connect over SSH / TLS-enabled protocols. Is there any way to check which process(es) is(/are) eating up all the entropy?

Edit:

What I tried:

  • Adding additional entropy sources: time_entropyd, rng-tools feeding urandom back into random, pseudorandom file accesses – netted about 1 MiB additional entropy per second, problems still persisted
  • Checking for unusual activity via lsof, netstat and tcpdump – nothing. No noticeable load or anything
  • Stopping daemons, restarting permanent sessions, rebooting the entire VM – no change in behaviour

What in the end worked:

  • Waiting. Since about yesterday noon, there are no connection problems anymore. Entropy is still somewhat low (128 Bytes peak), but TLS/SSH sessions have no noticeable delay anymore. I'm slowly switching our clients back to TLS (all five of them!), but I don't expect any change in behavior now. All clients are now using TLS again, no problems. Really, really strange.
Michael Hampton
  • 237,123
  • 42
  • 477
  • 940
Creshal
  • 269
  • 1
  • 5
  • 16
  • Have you possibly suffering from an attack? Someone repeatedly trying to connect to an SSL-enabled service and establishing a secure connection therby drawing entropy? But correlation to the reboot? Coincidence? – Michuelnik Jul 12 '12 at 05:41
  • The server is completely internal and not accessible from the outside. It is, however, a backup domain controller. The only thing I could think of was a background replication job (over encrypted connection) that ate up resources – as said, there was no suspicious activity. I'll file it under "shit happens". – Creshal Jul 12 '12 at 09:03
  • Check this out - the kernel change is the reason https://unix.stackexchange.com/questions/704737/kernel-5-10-119-caused-the-values-of-proc-sys-kernel-random-entropy-avail-and-p – Keld Norman Jul 19 '22 at 16:38

4 Answers4

4

With lsof out as a source of diagnostic utility, would setting up something using audit work? There's no way to deplete the entropy pool without opening /dev/random, so if you audit on processing opening /dev/random, the culprit (or at least the set of candidates for further examination) should drop out fairly rapidly.

womble
  • 95,029
  • 29
  • 173
  • 228
  • Might try that, thanks. Though you're sure that, say, the kernel's crypto sustem uses /dev/random directly? – Creshal Jul 09 '12 at 14:54
  • 1
    I'm not aware of anything in the kernel that heavily consumes entropy on an ongoing basis. The things that use randomness that I'm aware of (TCP sequence numbers, for example) are all PRNG-driven, and the crypto APIs are more about getting access to underlying hardware than eating entropy. At the very least, if nothing's opening `/dev/random`, you'll have ruled out one big possibility, and can go digging into the kernel. – womble Jul 09 '12 at 15:00
3

Normally for a public-facing server needing 'enough' entropy I would suggest something like an entropy-key, a hardware device (USB) adding random bits to the linux entropy pool. But you don't talk to the outside world.

Virtual machines can have a problem with lack of external randomness.

Your remark 'backup domain controller' does add a possible use of entropy: windows domains do use random numbers in certificates.

Koos van den Hout
  • 1,086
  • 6
  • 9
  • Agreed, we use SafeNet HSMs to do this with great success. – Chopper3 Sep 25 '12 at 12:22
  • Since the server is really, really old (Pentium 4-era Xeon), and newer Xeons have a hardware RNG builtin, I don't really want to spend money on that – else I'd have bought one. – Creshal Sep 25 '12 at 16:12
1

Perhaps lsof (list open files) might help. This shows, which process currently holds what files open. In your case this only helps when you catch your process(es) draining entropy, if that process does not hold the handle open for longer.

$ lsof /dev/urandom
COMMAND     PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
xfce4-ses  1787   to   15r   CHR    1,9      0t0 8199 /dev/urandom
applet.py  1907   to    9r   CHR    1,9      0t0 8199 /dev/urandom
scp-dbus-  5028   to   10r   CHR    1,9      0t0 8199 /dev/urandom
firefox    6603   to   23r   CHR    1,9      0t0 8199 /dev/urandom
thunderbi 12218   to   23r   CHR    1,9      0t0 8199 /dev/urandom

Just a sample from my workstation. But diving deeper into lsof might help.

Michuelnik
  • 3,260
  • 3
  • 18
  • 24
  • Empty output for urandom, and the only programs having random open are – surprise – the entropy daemons. – Creshal Jul 09 '12 at 14:07
  • Well, neither lsof nor netstat turned up anything suspicious. If any, there's suspiciously low activity on the system. – Creshal Jul 09 '12 at 14:27
0

If there is no better solution you might bring the big guns in and globally wrap the open() syscall to log the processes that try to open /etc/[u]random.

Just(tm) write a lib defining open() thats logs and afterwards calls the original libc open().

Hint for that: man ld.so and /etc/ld.so.preload.

We've had something similar here: https://stackoverflow.com/questions/9614184/how-to-trace-per-file-io-operations-in-linux

CAVEAT: Never did this myself. Might break your system since every open() will run through your lib. Possibly okay in debug-environments or if you're R.M. Stallman.

Michuelnik
  • 3,260
  • 3
  • 18
  • 24
  • Well, it *is* a production machine, so if I'm going to test this, I'll have to wait until the weekend. But thanks for the pointer. – Creshal Jul 10 '12 at 15:28