Are humans a strong or weak RNG?

Question

Thomas Pornin has stated in the past on multiple occasions (I'm not going to source them, he can argue with me if he wants) that humans are bad RNGs.

While I agree that human RNG for password generation in the mind is abysmal usually, I wanted to ask if human-aided RNG by a computer is equally as bad. KeePass has a feature where you seed the RNG by moving the mouse for a while, and while I know that if KeePass is using /dev/urandom it's more or less secure enough, I've used the mouse-seeded RNG in the past.

I've always thought that RNG aided by human input would be better than just standard PRNG as provided by an operating system. How could someone predict exactly how I'd move my mouse, at what rate, how often I'm pausing, etc.?

The operating system wouldn't just take mouse movements into account, it takes in process ids, thread ids, hardware counts from the CPU. CryptGenRandom also uses low level performance statics to seed its PRNG. People tend to be repetitive, and like their patterns. I'd trust the machine's stats over a human any day. — RoraΖ, Jan 15 '15 at 19:21
Humans are a strong random generator, but only in youtube comments. — AviD, Jan 15 '15 at 21:48
YouTube comments are an _HRNG_, horrifying random number generator. — Naftuli Kay, Jan 15 '15 at 23:31
Honestly, it seems like Youtube comments would be trivial to replicate with a simple Markov model. They are hardly random at all; there are very strong and low-entropy patterns - that's why people think they're boring. — Superbest, Jan 16 '15 at 00:40
Also, regarding the last sentence of your post: They can ask a bunch of people to move their mouse while they record the data, and then analyze it statistically. They won't be able to predict *exactly* how you will move your mouse, but they will find *a lot* of ways in which you *won't* (for instance, faster than humanly possible). This would effectively mean the true entropy is much less than expected. — Superbest, Jan 16 '15 at 00:57
I'm reminded of [this](http://assets.amuniversal.com/321a39e06d6401301d80001dd8b71c47) Dilbert comic... — Shaamaan, Jan 16 '15 at 08:14
Note that low-entropy sources are fine as a TRNG *provided you know a lower bound on entropy* and condense properly. YouTube might be a perfectly reasonable random source if you estimated, say, 0.5 bits per comment. And you never re-used data. And if nobody but you could observe or affect the comments. That last point eliminates people's incentive and ability to post, ofc. The problem with YouTube comments in practice isn't their degree of randomness, it's the fact that your attacker can view and post the comments you're using. — Steve Jessop, Jan 16 '15 at 10:16

score 45 · Accepted Answer · answered Jan 15 '15 at 21:06

Human brains are poor RNG. People are bad at generating random values in the privacy of their heads. They just cannot think randomly; though they can convince themselves that they do.

Physical process, on the other hand, are rather good sources of entropy. Take your mouse movements. A few dozen times per second, the mouse measures how far it has moved since the last tick, and sends that information to the server. When your hand shakes, it tends to do so somewhat regularly, but biology is such that each elementary move will be subject to some jitter, which happens to be substantially bigger than the precision of the mouse; even with a lot of training, it is very hard for a human hand to do the exact same move repeatedly (otherwise there would be a lot more people like Yehudi Menuhin). So the bottom line is that mouse movement measures contain some entropy. (Remember that "entropy" is here defined as "that which the attacker does not know"; the mouse certainly knows how much it has moved, since it is that mouse that actually sends the values on which the RNG are built.)

The other half of the answer is aggregation. A mouse-based RNG will use hundreds or even thousands of measures, accumulate them all and condensate them into an appropriate seed that will concentrate all that entropy. This is simple enough: simply feed all the values to a cryptographic hash function, e.g. SHA-256, and you will get a 256-bit seed that has all the source entropy, wherever it was hiding in the measured mouse movements. Hash functions are good for that; they reduce the size but keep the entropy (up to the hash function output size, but 256 bits is more than enough for all purposes).

An attacker may guess that the user will do circles, but will have a hard time getting all the individual movements right, especially since psychology won't help him: the human user himself has no idea how his hand movements are turned into numbers. Since we are talking about hundreds of numbers, the number of possible combinations (i.e. "entropy") raises exponentially. Contrast that with a human user thinking about a new password: the user will choose letters following some inner "witty" train of thought, that the attacker can guess more or less brutally (e.g. if the letters are all the first letters of some words in a sentence from a book, the attacker can automatically try all sentences from all books he can find in electronic format); and, more importantly, the human user won't be bothered to produce more than a dozen or so of "seemingly random" characters.

In passwords, length does not make strength -- but lack of length can be quite effective at preventing strength.

All but the best mice are likely to miss minor movements though - the static friction between the mouse and whatever surface it's on will damp the movement. However when this is overcome by deliberate movement the jitter will be useful - I would expect KeePass to check for actual movement as well as using a function that doesn't depend on the gross movement (such as hashing each position - or rather each movement). — Chris H, Jan 16 '15 at 10:13
Just for a colloquial example, I noticed early on in college that, whenever I was trying to think of a random number, my brain always tended toward the numbers 3 and 7. 1, 5, and 10 were almost never chosen. Once I became *aware* of that, the numbers 3 and 7 were almost never chosen. We're just primates finding patterns. — asteri, Jan 16 '15 at 15:32
Are there studies that actually prove this, or is this based on your own experience? I, myself, could make up a pretty random combination if I'd like to.. — Michael, Jan 17 '15 at 08:48
Yes, there are studies that prove this. And while you may be able to make up a combination that couldn't be brute forced, you could not make up a combination that was random. Feed it to dieharder, and it would scream at you that it detected significant biases. See: http://people.ischool.berkeley.edu/~nick/aaronson-oracle/index.html. It guesses correctly 60-80% of the time. There is a similar JS-based program which is predicts the next binary you will "randomly" select around 90% of the time. And it is a very, very simple algorithm. Imagine how much more efficient a sophisticated one would be. — guest, Mar 06 '17 at 01:27

Devon Holcombe · Answer 2 · 2015-01-17T01:09:13.653

4

Humans are very poor generators of randomness, especially upon request.

Most users are going to do one of a few things as human behavior is fairly predictable. Using the mouse as an example users are likely to move it side to side or up and down until enough "randomness" is generated according to the program. Perhaps they'll move it in a circle. What they're unlikely to do is move the mouse in a truly random way. If someone cared they could analyze a set of users, extrapolate likely behaviors and extract useful information from those studies to discover patterns which could be used to attack such a system.

It's really hard to get truly random data.

From Secure Programming Cookbook for C and C++: Unfortunately, most mouse movements follow simple trajectories with very little entropy. The most entropy occurs when the pointer reaches the general vicinity of its destination, and starts to slow down to lock in on a target. There is also often a fair bit of entropy on startup. The in-between motion is usually fairly predictable. Nonetheless, if local attacks are not in your threat model, and the attacker can only guess approximately what parts of your screen the mouse went to in a particular time frame based on observing program behavior, there is potentially a fair bit of entropy in each mouse event, because the attacker will not be able to guess to the pixel where the cursor is at any given moment.

edited Jan 17 '15 at 01:09

answered Jan 15 '15 at 20:00

Devon Holcombe

211
2
7

2

"It's really hard to get truly random data" - Not with a Geiger counter, and yes, there are ways you can do it at home with an Arduino. Linux's PRNG is "good enough." – Naftuli Kay Jan 15 '15 at 20:05
Also, this still doesn't address my concerns. If we know user A likes circles, can we still predict the rate at which he draws circles? We're not sure how long it'll take, and isn't the "fuzziness" of the data good enough? – Naftuli Kay Jan 15 '15 at 20:06
What Thomas Pornin said is true. I would comment there, but don't have enough rep so will have to leave a comment on my own answer. In the particular case of the mouse it's probably good enough for most purposes. In the general case you have to be careful about your source of entropy when involving humans though. – Devon Holcombe Jan 15 '15 at 23:10
"If someone cared" - why do you think no one cares about breaking this method for entropy creation? Of course they do. Either you haven't found the research paper that does it, or your answer is wrong. – djechlin Jan 15 '15 at 23:49
What about random.org? Also, I don't believe the attack you propose is feasible or realistic. – Superbest Jan 16 '15 at 00:50
So if I just think up a random stream of numbers it is likely to be poor? e.g. 7 5 1 1 6 2 8 7 5 9 4 7 8 8 4 3 2 7 8 0 0 5 6 5 5 4 3 5 2... – Michael Jan 16 '15 at 01:26
@Michael If you thought of 10000 random numbers there would probably be some pattern or bias, such that the entropy is less than log2(10). – Superbest Jan 16 '15 at 02:48
@Michael: just off the top of my head, it only takes you 20 samples to include all 10 digits. I think the probability of that is under 25%. Of course this isn't a definitive demonstration that your sequence has less than maximum entropy: 0.25 is not a good p-value and I cherry-picked a test. But humans typically make certain errors when they try to invent a random sequence, and one of those is to be too "fair" among the options. – Steve Jessop Jan 16 '15 at 10:10
Updated my answer to include a quotation from a more reliable source which hopefully helps explain the wishy washyness of my answer. Entropy is a complex thing. – Devon Holcombe Jan 17 '15 at 01:10

score 2 · Answer 3 · edited Jan 16 '15 at 18:32

With the following small python script, try to produce a sequence of 42 zeroes or ones by moving the mouse in a 'repeatable' pattern.

import Tkinter

root = Tkinter.Tk()

lx,ly = (0, 0)
while True:
        x,y = root.winfo_pointerxy()
        if ((x-lx)**2+(y-ly)**2) > 42:
                print (x ^ y ^ lx ^ ly) & 1
                lx,ly = (x,y)

If you have Python 3.x, you may also use the following instead:

import tkinter
tkinter.NoDefaultRoot()
root = tkinter.Tk()
x = y = 0
while True:
    x2, y2 = root.winfo_pointerxy()
    if (x - x2) ** 2 + (y - y2) ** 2 > 42:
        print((x ^ y ^ x2 ^ y2) & 1)
        x, y = x2, y2

score 1 · Answer 4 · answered Jan 16 '15 at 00:45

The quality of a RNG is a question of correlations to the inputs or some other set of non-secret parameters. Obviously if the output of a RNG correlates to something you know, that reduces the entropy drastically by allowing you to cut away large swathes of the possible output space - perhaps a given RNG can now realistically be expected to output only one out of 1,000 numbers as opposed to the advertised 1,000,000, which makes brute-force much more practical.

The RNG constructed by asking a human to think of random words or phrases is in fact poor. The reason is many strong correlates:

Correlation to culture, allowing you to use an English dictionary for North American targets or focus on 1900-2000 for PINs (also why passwords in other languages are a decent security-through-obscurity strategy)
Correlation to others, allowing you to build dictionaries of top X most common passwords
Correlation to historical behavior of self, allowing you to exploit past known passwords, or mine social media for clues like birthdate or hometown

Note that that many "secure" PRNGs are also based on a very predictable input, namely system time.

The key is to recognize that the human is not necessarily the only element of the RNG. It only acts as a seed. There is no reason why you would be allowed to only output the seed as is, so you can apply various functions to it to dilute the correlation (a basic but weak example is to add a very large number, multiply by a very large number and then take the modulo).

The mouse movements are only used as a seed for Keepass's RNG. If the function they use is reasonably capable of returning uncorrelated input, there's no reason to suspect that it's not secure. (granted, I couldn't tell you why exactly they don't just skip the mouse and use system time as their seed, or what function exactly they use and how secure that is)

Speculation on why mouse is used instead of system time: If you need to get a lot of Random numbers, you need to wait a certain length of time, because the typically system time has nanosecond or millisecond resolution. A mouse probably generates a lot more numbers in a comparable span of time, and they will probably be noisier and not in sequence. — Superbest, Jan 16 '15 at 00:49
"use system time as their seed" that wouldn't be very wise! PRNG create deterministic sequences! Given the same seed, the output is always the same! If you just use system time, that is very easy to guess! The Android documentation (https://developer.android.com/reference/java/security/SecureRandom.html) for SecureRandom for example suggests: Using the seeded constructor or calling setSeed(byte[]) may [...] return a predictable sequence of numbers unfit for secure use. [...]not recommended to use setSeed at all. — Josef, Jan 16 '15 at 10:30

score 0 · Answer 5 · answered Jan 15 '15 at 20:24

I think humans are okay, but it depends on how you ask them to generate the number. Also are they required to be able to replicate it? For smaller numbers humans are easily influenced and highly predictable. There's also the problem that humans will often start off with similar patters before they actually start generating something that resembles randomness.. take the following example

952167493... at the start, I began with a familiar number, maybe a phone number that's in the persons mind or street number, then humans might try to be random, and the likelihood of two numbers following a regular counting order will be low, after this they might hop around trying to fill out the number spectrum.

given enough samples, I think many humans would follow similar patterns, and you would get large numbers which aren't very random at all. You may even find whole swathes of numbers that don't show up, merely because they don't feel random to a human as they're creating them, like 123123123123123 or 44455523534333666

I wouldn't trust a human to create a random number unless they had some instructions that would decrease the likelihood of human number/pattern biases, even then it's a bad idea

`I think humans are okay` no they're ugly. – Cthulhu Jan 16 '15 at 07:59 — Cthulhu, Jan 16 '15 at 07:59
If I were a cat, I'd think differently of humans. – Naftuli Kay Jan 16 '15 at 09:44 — Naftuli Kay, Jan 16 '15 at 09:44

score 0 · Answer 6 · answered Jan 16 '15 at 09:42

There are some very good answers here already so I just want to provide some anecdotal evidence. It seems clear already that in motion based RNGs humans can provide a strong source of entropy due to inexact reproduction of movements.

In the case of asking humans to select a random number however, strong trends begin to be shown. I remember a math class back in year 8 where we generated 100 "random" numbers by asking people that we encountered and the results showed just how predictable humans can be. I can't remember the exact results, but the number 7 for example came up three times more often than the next most popular answer, no person gave an answer that was more than two digits and all answers were integers. I'm sure if this was performed on a larger scale a similar trend would be produced.

score 0 · Answer 7 · answered Jan 16 '15 at 10:26

It is my opinion, that humans are very poor at generating randomness on request. There are ways to "extract" randomness from human interactions. The mouse movement is a good example. The randomness doesn't come from the intentions of the user (most people will draw lines or circles) but from the error in the movement. Humans just can't draw exactly repeatable figures using a mouse.

While looking for answers, I found two 'interesting' papers:

Humans can consciously generate random number sequences: a possible test for artificial intelligence.[1]
Humans cannot consciously generate random numbers sequences: Polemic study[2]

Sadly, both are only available commercially, and if you are not a student or researcher at some University you'd have to pay. (don't do it for this papers!) I think the titles are funny enough.

I also found more serious and interesting papers, you could read:

Games for extracting randomness[3]
Assessing randomness and complexity in human motion trajectories through analysis of symbolic sequences[4] (not fully on topic, but very interesting)
This Answer on the Cognitive Sciences Stack Exchange lists some

All of this suggests (except the first paper, but I don't take that very serious) that humans are bad when asked to consciously generate randomness, but if you do it right you can extract good randomness from human movements!

Are humans a strong or weak RNG?

7 Answers7

Linked