What research suggests that user's mouse movements are (not) sufficiently unpredictable for secret key generation?

Question

I have not been able to find any credible source which tried to prove or disprove the randomness of mouse movements.

A Google Scholar search for "mouse movement entropy" gives surprisingly few results: about one page of computer science results, of which three which are tangentially related, before it gets to research on "behaviour of mice in open fields" (which was enthralling, by the way).
1. "User re-authentication via mouse movements" is related but tangential. While they seem to get pretty good results in terms of false positives and false negatives, they note that it's not good enough for authentication by itself. This indicates that, at least when you have one mouse movement sample of an individual, it should be much easier to predict future movements (and nobody ever visits an untrusted website which could capture your mouse movements).
2. "A true random number generator based on mouse movement and chaotic cryptography" is a proposal on how one could implement such an RNG, not whether the source (mouse movements) are actually unpredictable in the first place.
3. And a git repository for the aforementioned proposal.
Relevant questions on this website do not answer the question: none are either long enough to contain evidence in itself (which would have to be empirical, since we're talking about analysing human behaviour, so that means describing a test setup, giving statistical properties of the results, describing the conclusion... a big post altogether) or link to such research.
- What are you doing when you move your mouse randomly during a truecrypt volume creation?
- Research about entropy of human randomness?
  - Note that this question is not a duplicate because I am asking about mouse movements specifically. That question is more broad, as exemplified by the answer regarding user-chosen passwords or user-generated brain waves.
- How much more secure is encryption if the program requires random input from the user?
- Is this password generator safe?
- How much time / "entropy collected" should generating a RSA 2048bit keypair take?
- How high is the entropy of this salt-generating code? (No code-reading actually necessary)
- Approximately how much entropy in each of these low entropy sources?
- For how much time should I randomly move the mouse for generating encryption keys?
On our sister site, there are again relevant questions but no answers which answer this question:
- https://crypto.stackexchange.com/questions/3347/are-mouse-movement-coordinates-useful-as-a-seed-for-a-rng
- https://crypto.stackexchange.com/questions/39833/does-iterative-hashing-of-mouse-keyboard-input-improve-its-properties-as-an-entr
- https://crypto.stackexchange.com/questions/1618/feedback-on-rolling-my-own-entropy-gatherer
- https://crypto.stackexchange.com/questions/25847/hkdf-entropy-extraction
- https://crypto.stackexchange.com/questions/14632/standard-or-guidance-for-entropy-collection
- https://crypto.stackexchange.com/questions/14632/standard-or-guidance-for-entropy-collection
- Mentions that NIST defines some statistical tests which you can do, presuming it will show that mouse movements pass such tests. I am not sure how comprehensive this battery of tests really is since in my experience, most such tests pass anything and everything with flying colours. It might be good to know if those tests are reliable and if so, if they indeed approve of mouse movements.
Random web results:
- https://en.wikipedia.org/wiki/Entropy_(computing) None of the sources seem to answer the question. (The article does not claim mouse movements have good entropy.)
- https://www.reddit.com/r/crypto/comments/937qzb/my_findings_on_extracting_entropy_from_mouse/
  - The conclusion is more about how compressible mouse movements are, than about the actual unpredictability. As a short example, while a compressor would usually not be able to compress the output of a simple RNG by much, observing even a few outputs might be enough to predict the next state. Similarly, if a few points describe the curve the mouse is making, one can extrapolate many of the other points. I'm not saying the source is irrelevant, but I also don't see it as (close to) conclusive evidence.
  - Reading the comment thread after writing the above, I found it funny to see someone saying "Estimating [unpredictability of] user mouse movement is a complicated human factors question.. there is probably some academic research on that already". You don't say!

It might be relevant to mention that a quick look at the Puttygen source code indicates that it seems to generates private keys solely based on mouse movements. It fills an array with the time of mouse movement events in the even cells and the mouse position in the odd cells, sprinkles some magic shuffling over it (shuffling memory, xoring fields), and calls some RSA/DSA/EC* key generator with the array as argument. Whether there is serious evidence that mouse movement is a good entropy source is quite important for such use-cases. Note that this is different from using it as an additional source, such as in the Linux kernel, which will only increase the quality even if it's a mediocre source.

I have a hard time believing nobody ever looked into this. What am I missing?

At its core, a mouse is an analog to digital converter. There is some event which may considered at least pseudo random which is sampled at a particular frequency. The result is a digital value that varies in a non-periodic way.... So, the question regarding mouse movements isn't really about mouse movements... it's about using an external A/D conversion of some event as a source of entropy. Some random number generators use a lamp... not LED mind you, a lamp... and then sample the value of current running through the lamp over time. — RubberStamp, Feb 20 '19 at 01:02
@RubberStamp Relevant Youtube link [The Lava Lamps That Help Keep The Internet Secure](https://www.youtube.com/watch?v=1cUUfMeOijg) — Hagen von Eitzen, Feb 20 '19 at 03:12
The lack of papers eems surprising - after all we know that one of the fundamentals of RNG is that writing some random code will produce a very poor RNG (I think Knuth had a nice formulation of this - and a nice example from personal experience) — Hagen von Eitzen, Feb 20 '19 at 03:19
Mouse movement, if sampled at the interrupt level, is an extremely good source of randomness because of the natural stochasticity of our neuromuscular system. I'm sure some operating system HID APIs are not good enough due to the predictability of task scheduling, so sampling `/dev/input/mouse0` might not be great, but mouse and keyboard inputs are _absolutely_ useful when the CPU's cycle counter is sampled in an interrupt handler. (Writing as a comment instead of an answer because I haven't linked to any research papers, as the question is asking for) — forest, Feb 20 '19 at 03:47
@RubberStamp It has nothing to do with randomness in the mouse's sensors which, for the sample rate they run at, are highly predictable. It entirely comes from the unpredictability of human motion. — forest, Feb 20 '19 at 03:58
@forest "because of the natural stochasticity of our neuromuscular system" It sounds like you read about that somewhere? Even a relevant biological paper could be interesting if it suggests something about the answer. — Luc, Feb 20 '19 at 08:21
**Some things do not need papers to be written about.** There is no surprise to me that there has been no serious academic study on the non-randomness of a human's hand in a limited range of motion using an interaction device designed to capture a human's interaction in a pattern. Just as there are no papers that will be written to measure a computer mouse's ability to navigate mazes. There is just not something to study. — schroeder, Feb 20 '19 at 08:28
@Luc I think I could write an answer, but it would boil down to "there are no such papers, but here's some reasons why it's really, really hard to predict motion at these scales". Would something like that work? — forest, Feb 20 '19 at 08:47
@forest You just made me question life, the universe and everything. I was going to say "if your answer is self-evident, then of course; most answers here are not sourced because they are self-evident given basic knowledge", thinking that I should be able to be convinced by anything true, since you can give all the pointers and I can google all the facts. I thought we, as humans, use that to make informed decisions. — Luc, Feb 20 '19 at 09:31
@forest But I realized it's not necessarily true: someone can tell me dihydrogen monoxide kills a bunch of people every day, creates corrosion in metals, etc., and (assuming I don't know the chemical H2O because I'm not a chemist) I will just have to believe them without understanding the whole system and realizing that the arguments are irrelevant. I probably won't be convinced because I know I don't understand the relevance of the arguments. I will just have to assume your answer is true until either I understand the whole system better, or new evidence is presented that does convince me. — Luc, Feb 20 '19 at 09:31
@forest Additionally, as far as I know science doesn't have a good grasp of our brain to begin with. Any theory-based answer might not convince me, even if I would be a neuroscientist myself. I'm not, so I can't say that for sure, but it doesn't help the case of a neuroscience-based theoretical answer. — Luc, Feb 20 '19 at 09:31
@forest I would like to ask you to try, but given my reasoning as mentioned... it probably costs you more time than it's worth to dumb it down far enough that I can follow (I have roughly a primary school level understanding of biology/neuroscience); most of what you said above is Chinese to me. If you think it might convince other readers, though, that's surely worth something. — Luc, Feb 20 '19 at 09:34
"when you have one mouse movement sample" - this is actually the key thing. You have it, the OP story comes in. You don't have it, there's no way to generally make a determination for a random person without any samples. — Overmind, Feb 20 '19 at 09:39
@Overmind I'm not so sure: you can't move a cursor from (0,0) to (1920,1080) without generating other points, and it usually has to move before it is registered (input events are used, not the position at a given time), so e.g. [(0,0),(0,0)] can be ruled out as sequence. That's the easy part. Then, it might turn out most people alternate between circles and line-like wiggles, so you have to only brute force that sequence, the x,y,r of the circle, and account for a certain amount of random variation on each lap. Might not be *easy*, but it might be possible like SHA1 collisions are "possible". — Luc, Feb 20 '19 at 09:50
@Luc It's true that we know precious little about neuroscience, but we do have a fairly good understanding of individual neurons. We can measure the variation in how long it takes an action potential to move down the axon and, if that variation is caused by probabilistic events like ion channels opening and closing, we can conclude that it is fundamentally impossible to predict beyond a certain bound. A theory-based answer I can provide can attempt to explain why it's so hard to predict movements. — forest, Feb 20 '19 at 10:30
@Luc, additionally to the significant differences from one person to another, you also have to consider the device type, accuracy (as in dpi) and settings (OS part of settings). Any of the 4 types of information, if missing, will lead to no practical result. Yes, such a determination is mathematically possible but extremely improbable if at least one of the 4 elements mentioned is missing. — Overmind, Feb 20 '19 at 12:30
@Overmind DPI tends to be irrelevant since it's the timing that matters. The actual position contains very little entropy, but the precise nanosecond an event occurs over hundreds of samples becomes significant. — forest, Feb 20 '19 at 12:33
The more accurate the DPI, the more samples you have and therefore a more accurate function you can define. Sampling 3 points at a 100px movement is not as relevant as sampling 300 of them. In such details you can extrapolate how a user's hand behaves on certain distances, the axial variation of the movement and much more such elements, which later are the basic blocks in accomplishing your objective. — Overmind, Feb 20 '19 at 13:05
You may find https://tools.ietf.org/html/bcp106 interesting. — forest, Feb 27 '19 at 05:27

forest · Answer 1 · 2022-04-03T00:31:53.367

I do not believe there are any research papers that describe the unpredictability of the neuromuscular system in the context of computer security and at the sample rates that are relevant, so I can't link the research papers you want. However, I can explain at least a few of the reasons why human movement is so stochastic and unpredictable. It all boils down to a simple fact: Living tissue is a really, really sloppy medium for information transmission. This has been known for a very long time and has constantly been a thorn in the side of computational neuroscientists and anyone trying to make mathematical models of neuron groups. Biological neural networks are terrible computers.

The transmission speed of neurons varies, sometimes significantly. Even within a single neuron, the speed is variable. In addition, the number of action potentials (electrical signals that propagate down the axon) sent in succession is dependent on the probability that microscopic ion channels will open at any given time. Additionally, muscles have a high amount of jitter. The force from a muscle does not come from a gradual increase in activity, but from discrete bunches of muscle cells, called motor units, being activated. Even if we flex as hard as we can, we never activate 100% of the muscle's motor units (if we did, it could cause physical damage to the tendon). This random motor unit recruitment leads to the twitchy vibration typical of voluntary skeletal muscles.

The combination of the extremely stochastic behavior of neurons and the probabilistic activation of muscle cells leads to minute variable delays in timing. While these delays are imperceptible to a human, a computer ticking away at billions of times per second (yeah, I know this is actually limited by the speed of the keyboard microcontroller, among other things) quickly notices this. Millions of cycles can pass by due to the random delays intrinsic to neural transmission, and these random delays can be measured and used as a source of entropy. While we do not have any research showing exactly how many bits of entropy each action potential will generate, we can make an extremely conservative guess and say that the entire process, from brain to muscle to keystroke, leaves us with around a single bit of entropy. All it takes then is a few hundred keystrokes to obtain a cryptographically significant amount of entropy.

The stochastic behavior of the human brain is described quite well here (see sections 3 and 4).

Entropy from keystrokes or mouse movements thus comes from two sources:

Individual variations in people due to a unique number neurons and unique neural circuits.
Time-dependent variations in action potential transmission speed, motor unit recruitment, etc.

All of this results in random delays that, while irrelevant to everyday tasks, is quite visible to a computer with sometimes even sub-nanosecond temporal resolution. If we sample the time of events (not just data like mouse pointer position or key being pressed), we can safely say that it contains at least a nominal amount of entropy. After all, given all this stochastic randomness, it should become obvious that it is impossible to guess, within nanoseconds, how long it will take a signal originating in our brain to trigger a muscle to contract to depress a key.

However, it's important to know that it's easy to get entropy collection wrong. You can't just sample mouse movement and use system time as a clock. You need to trigger the sampling immediately when the event occurs. This means collection must occur within the kernel, typically within an interrupt handler that is executed instantly when an interrupt occurs. Otherwise, it's very possible that predictable scheduling delays will taint the collected entropy. After all, what's the good of a random keypress event if it's buffered as soon as it occurs and is only released to userspace in predictable intervals? You should always leave entropy collection to the OS itself.

I'd like an experiment... something like **1** Setup a mouse on a mechanical platform that **1a** moves the mouse in circles ... **1b** vibrates the mouse using a motor set at a particular and uniform speed... **2** have a human move the mouse ... calculate the entropy of the output of each... and see if moving the mouse with a motor running at uniform speed produces similar levels of entropy to human generated motions.... **lastly** this answer does not take into account the security sense of using a computer peripheral device as an entropy source.... has it been compromised? simulated? — RubberStamp, Feb 20 '19 at 11:36
@RubberStamp No need to do an experiment. Just look at old slot machines. People may try their hardest to get their timing just right, but our movements are so jittery that this fact holds up the multi-billion dollar gambling industry. As for the security of the peripheral device, that's entirely tangential to the main point. — forest, Feb 20 '19 at 11:39
* >No need to do an experiment * ... Experimental evidence is the core of science. Of course, we need experimental data to verify the claims... Is mouse gathered entropy driven by the human input as you claim or can similar entropy be gathered from a mouse that is moved by a uniformly spinning machine that vibrates a platform? If the machine vibrated mouse generates similar levels of entropy, then the human-neuron connection is too low a level... and the human is no longer required, just an event of particular "sloppiness" — RubberStamp, Feb 20 '19 at 11:50
@RubberStamp Unfortunately, mere randomness doesn't imply data suitable for cryptographic purposes. It might appear very random but still be predictable, especially if it's a chaotic system. While experiments can be useful, that specific experiment implemented with the methods you describe would not necessarily give valid results. My "just look at slot machines" claim was intended to be a brief _a fortiori_ argument, not a literal alternative to empirical evidence or the scientific method. — forest, Feb 20 '19 at 11:52
*>Unfortunately, mere randomness doesn't imply data suitable for cryptographic purposes.* ... all the more reason for experimental data to be included as justification... The experiment may take a few weeks depending on my time... I've got the equipment, just need to setup... Anyone interested in seeing the results of an experiment as I've outlined? — RubberStamp, Feb 20 '19 at 12:07
@RubberStamp It may be interesting. Do you have the equipment capable of moving things with that level of precision? I would recommend against vibration since that would be a chaotic system and naturally appear random, but perhaps moving the mouse using a stepper motor would suffice. — forest, Feb 20 '19 at 12:08
*>I would recommend against vibration since that would be a chaotic system and naturally appear random, but perhaps moving the mouse using a stepper motor would suffice.* ..... That would be the point I was making... That a vibration board run by a uniformly spinning motor is comparable to a human moved mouse... Adding a stepper motor would create a third level of comparison... — RubberStamp, Feb 20 '19 at 12:20
Let us [continue this discussion in chat](https://chat.stackexchange.com/rooms/89971/discussion-between-forest-and-rubberstamp). — forest, Feb 20 '19 at 12:22

score 0 · Answer 2 · answered Feb 20 '19 at 16:34

IMHO, the concept is borrow from mouse movement user fingerprinting, in which, the scale of complexity is totally different when apply for key generation.

If you start comparing both, you will notice, user identification using mouse movement fingerprinting required less entropy to improve identification, high false positive is acceptable. Thus, them mouse movement fingerprinting will yield a high accuracy.

On the other hand, mouse movement key generator is using high entropy subject to the person mood and environment, which I doubt anyone can reproduce in any control environment. E.g. a person in empty stomach will move the mouse differently compare to a full stomach.

I think mouse movement fingerprinting/biometrics is much newer. — forest, Feb 21 '19 at 01:15

What research suggests that user's mouse movements are (not) sufficiently unpredictable for secret key generation?

2 Answers2

Linked