15

Proper security algorithms demand true random numbers. For instance, secret keys & initialization vectors should never not be true random.

However, generating numbers using Java's Random library or C's srand() initialization & then rand() are only able to generate pseudorandom numbers. From what I understand, since functions like srand() gather the seed from some source such as system time, and the 24 hours time is cyclical, it is not truly random. Please correct me if this assumption is flawed.

Also, an example of a truly random number would be if we use a seed from let's say an audio file, and picked a pseudorandom place in the file and then get the audio frequency at that location. Since only the location was pseudorandom but the frequency at that location was not, the value is truly random. Please correct me if this assumption is flawed.

Finally, apologies for compounding the question further, exactly how vulnerable would it really leave systems if pseudorandom values are used? I have learned that AES 128 is actually enough to secure systems (Is 128-bit security still considered strong in 2020, within the context of both ECC Asym & Sym ciphers?). For military standards, 192 & 256 were adopted (Why most people use 256 bit encryption instead of 128 bit?). Is using true random values also akin to following such baseless standards, or is it actually crucial?

nobody
  • 11,251
  • 1
  • 41
  • 60
mindoverflow
  • 277
  • 1
  • 5
  • 2
    See https://security.stackexchange.com/questions/84906/predicting-math-random-numbers for some interesting reading on cases where javascript's Math.random() function (which is a PRNG but not a CSPRNG) have been exploited. Also see https://security.stackexchange.com/questions/239358/attacks-relying-on-poor-entropy for an interesting read on how Netscape got it wrong in the SSL implementation in early versions of Netscape Navigator. – mti2935 Dec 24 '21 at 12:24
  • thank you. i will go over these. – mindoverflow Dec 24 '21 at 15:39
  • 5
    Doesn’t change your point, but audio is basically always a superposition of many frequencies with various amplitudes - at a certain timestamp, you could read out a kind-of-random value, but it’s _not_ a frequency. It’s not a good random number either. – Aganju Dec 24 '21 at 23:00
  • You can never get true randomness from a pseudorandom source. True randomness always comes from the "meat space". Radioactive decay, EM noise, temperature fluctuations, etc. –  Dec 25 '21 at 02:59
  • Any random number generator is psedo-random if it uses any type of seed value. It's because if you can guess wich seed was used in generator, - you can recover all random numbers which were generated at that stream moment. In that sense pseudo random functions are completely **deterministic**, bc uses algorithms to generate randomness. True random data, however does not depend on any input or variable and is always unpredictable. Like given molecule position at given time in gas container, that's why atmospheric noise used in random.org project. Or your radio background noise between stations. – Agnius Vasiliauskas Dec 25 '21 at 07:05
  • Water vs Holy Water – CodesInChaos Dec 25 '21 at 10:40
  • There is no such thing as "True" randomness in the physical world, because "true randomness" is a mathematical concept and mathematical concepts cannot be established through empirical (physical) reasoning. – RBarryYoung Dec 25 '21 at 18:11
  • 1
    @RBarryYoung That becomes quite philosophical, and one might as well say "there is no reality, because it's all based on individual perception". There is indeed a concept of "true randomness" in information security. –  Dec 26 '21 at 01:41
  • @MechMK1 It is not philosophical, it is strictly mathematical. The claim that some physical processes are "truly" random because we don't know their initial boundary conditions or because we do not have the means to predict it is equating a lack of information and/or knowledge to randomness, which is incorrect. *Mathematically, that's not what randomness is*. And cryptographic security is firmly based in mathematics, not philosophy. What you are actually alluding to is "*effectively random*" not "*truly random*", and correct use of CPRNGs is already effectively random. – RBarryYoung Dec 26 '21 at 14:41

5 Answers5

43

There is a (common) misconception in this question that there is such a thing as “true” randomness and that this matters for security. In fact, whether “true” randomness exists is a philosophical question (physics gives a partial answer), which is not relevant for security.

There are many notions of randomness. What is relevant for security is unpredictability. Security is defined as protecting against adversaries. A value is “random”, for security purposes, if your adversary cannot find or guess it.

In the context of security, “true random” is sometimes used to mean a value is based on some physical process that no adversary can reproduce. For example, a coin flip is generally truly random in that sense. But not if the coin is too heavily biased, and not if the adversary can see the result of the coin flip. Performing 128 coin flips in front of a camera will not give you a secure 128-bit random number. (It does give you a value that cannot be predicted in advance, which is good for a few things but not good, for example, as a cryptographic key.)

Conversely, a value which is calculated in a deterministic way by a cryptographically secure pseudorandom generator (CSPRNG) is perfectly fine as a random value, as long as adversaries cannot learn the seed or the internal state of the random generator. The fact that only deterministic physical processes were involved other than the generation of the seed, and that the same seed was used to generate other random values, do not compromise the security of the random value (assuming that the CSPRNG was correctly designed and implemented — an assumption that holds for any secure processing).

“True” randomness is necessary for security because you have to seed the CSPRNG somehow. You can seed a CSPRNG with the output of a CSPRNG, but ultimately, you have to start somewhere with a non-deterministic or non-sufficiently-precisely-modeled physical process.

A random value is only good enough if it's sufficiently unpredictable. If I tell you that I picked my secret key at random between 12729af5a51075a68db9d4b05ce7981a and fc42099f25ee1eb5a8dc1178c35868b8, that's not good for me: I did in fact generate those two values randomly, and you don't know which one it is, but you can still find it in at most two guesses. The measure of unpredictability or unknownability of a value is called entropy (beware that there are many related, but distinct concepts called entropy). A fully known value has an entropy of 0. A value which has equal chances of being one of two known possibilities has an entropy of 1. By telling you that my key is one of these two values, I've reduced its entropy to at most 1 bit, no matter how randomly those two values were generated.

An audio file may or may not have a large amount of entropy. At one extreme, if the adversary has the same file, the entropy is 0. If the adversary has a different recording of the same sound, there may be artifacts due to the microphone quality and placement, but audio compression would tend to remove these artifacts. Microphone white noise can be a decent source of entropy, but you should get it directly from the hardware: by the time you get a recording, it's hard to make sure that the noise has been preserved and that the same noise hasn't been copied somewhere else.

The problem with rand() functions in the standard library of most programming languages is not that they're pseudorandom, but that they're not cryptographically secure for two major reasons:

  • The adversary can find the seed. For example srand(time()) is mostly predictable (depending on how precisely the adversary knows when your application ran – this has nothing to do with time-of-day being cyclical). And even if the adversary doesn't know the seed, if the seed is a 32-bit number, it's easy for the adversary to try all possible seeds by brute force. If it's a 64-bit number, it's costly but still doable.
  • Outputs are not independent: given enough outputs from rand(), it's possible to calculate other outputs.

Secure random generators, such as /dev/urandom or CryptGenRandom, have neither of these flaws: they use a CSPRNG algorithm (which guarantees independent outputs) from a secure seed (which, in modern computers, can be generated from a component in the main CPU).

Gilles 'SO- stop being evil'
  • 50,912
  • 13
  • 120
  • 179
  • Another requirement for cryptographic randomness is that any information about the state of the generator be destroyed before an adversary would be allowed to use the same generator. If e.g. one had a pseudo-random number generator whose initial state was completely unpredictable, and which could be treated as a random oracle unless someone had access to about two billion consecutive bits, and a program used it to produce 10,000 bits, that would be fine if nobody else could get more numbers based upon the same state, but if an attacker could ask for a few billion random bits... – supercat Dec 24 '21 at 20:40
  • ...after the initial program had read its 10,000 bits, the attacker may then be able to figure out the state of the generator and from that figure out what its initial state must have been when the 10,000 bits were generated. Perhaps that's included in your definition of "unpredictability", but not in the normal English sense of the word. – supercat Dec 24 '21 at 20:42
  • 2
    @supercat “Another requirement for cryptographic randomness is that any information about the state of the generator be destroyed before an adversary would be allowed to use the same generator” That's a requirement on a secure _random generator_. And it's a consequence of the definition I give for a CSPRNG: the adversary must not be able to find the generator state from seeing outputs. It's also a consequence of the inputs being independent: if the adversary can reconstruct the state, then they can know for sure what the outputs are. – Gilles 'SO- stop being evil' Dec 24 '21 at 23:17
  • An additional reason that `rand` isn't and **can't** be cryptographically secure is that the seed is of a small integer type (`unsigned int` in C, which is 32-bit or less in all relevant real-world contexts). No matter how good the CSPRNG is on top of that, you only have a small finite number of seeds to try to brute-force it. – R.. GitHub STOP HELPING ICE Dec 25 '21 at 18:07
  • at one point, things start t oedge on philosophical. is anything ever truly random? a set of coin toss data would produce something akin to a predictable bell curve every time. you have helped me put it into perspective that we aren't looking for __random__, we're looking for __unpredictable__. thanks. – mindoverflow Dec 25 '21 at 23:51
9

With regard to:

Also, an example of a truly random number would be if we use a seed from let's say an audio file, and picked a pseudorandom place in the file and then get the audio frequency at that location. Since only the location was pseudorandom but the frequency at that location was not, the value is truly random. Please correct me if this assumption is flawed.

It seems like this approach would produce 'random' values with very little entropy. For example, if the audio file is sampled at 44.1 KHZ, and the audio is 5 minutes (300 seconds) long, this means that there are only 13,230,000 possible values (44100 * 300 = 13230000) in the space that this 'random' number generator produces. An attacker who understands how your 'random' number generator works could easily iterate through all ~13 million values in short order, applying each one as the key in your encryption algorithm, until it finds the correct key that decrypts the ciphertext. See Write a Python or C program to guess the key for an (albeit somewhat contrived) example of how this can be done. Also, programs like hashcat and john the ripper can be used to automate this process.

With regard to your last paragraph: yes AES-256 (and even AES-128) are considered to be strong encryption algorithms - but their strength is contingent on a strong key. If you generated a key by doing: echo -n 'iloveyou123' | sha256sum, then an attacker can find your key very quickly, again using a program like hashcat or john the ripper.

Java has a built-in function SecureRandom(). On the page linked to above, it reads:

This class provides a cryptographically strong random number generator (RNG).

You might want to consider using this function.

mti2935
  • 19,868
  • 2
  • 45
  • 64
  • Thank you for this answer. I was actually thinking about making and application that would generate true random without significant overhead. Time in milliseconds seemed suspiciously too easy! But I did not foresee that the entropy would need to be noticably high for everyday security as well. I had some other ideas like analyzing RGB values on some point on a bitmap then multiplying but these seem very entropy-less now that youve mentioned the concept. We could involve PI but that would cause much overhead, right? Is there some mathmatical formal definition of true random I could view? – mindoverflow Dec 24 '21 at 15:47
  • Im sure you've seen how veracrypt does the random generation via user mouse cursor position. interesting stuff – mindoverflow Dec 24 '21 at 15:48
  • 2
    @mindoverflow good stuff, indeed. See FIPS 140-2 and RFC1750 for some standards around CSPRNG's (cryptographically strong pseudo random number generators). bitaddress.org also does something interesting by getting entropy from mouse movements. These days, most operating systems have a CSPRNG (in linux, there is `/dev/urandom`), and most programming tools/languages have a hook to the CSPRNG that you can use. – mti2935 Dec 24 '21 at 16:34
  • 1
    @mindoverflow : See also [lavarand](https://en.wikipedia.org/wiki/Lavarand) and/or Alpha-Phoenix's [Muon-Geiger-Nixie](https://www.hackster.io/news/a-geiger-counter-and-nixie-tube-random-number-generator-221749023c4e) random digit generator. – Eric Towers Dec 24 '21 at 23:24
  • @mindoverflow not inventing your own cryptography is a good cryptography practice. Of course, you are pretty much free to create anything you like, just don't call it to be secure without being a trained cryptographer and without getting a great deal of feedback from other reputable cryptographers (it happens that even this is not fail-safe, but is a good start). See "schneier's law". – fraxinus Dec 25 '21 at 21:35
  • just for research purposes. it would be a terrible idea for me to use my own crypto library! – mindoverflow Dec 25 '21 at 23:49
  • 1
    That's 13 million possible values, assuming there aren't duplicates. With compression in mp3 etc it's likely there's loads of duplicates, making that number ever lower. – Polygorial Dec 27 '21 at 15:58
4

Your first assumption is incorrect, as usually system time is pulling from a real time clock which tends to output a unix timestamp which doesn't technically cycle. Pseudorandom number generators like those in popular programming languages (usually the mersenne twister) do cycle, however, due to their nature as deterministic functions. If the generator is initialized with the right parameters, the length of the 'period' or time before the "pattern" of the numbers generated starts repeating is great (2^19937 – 1).

Your second assumption is also not entirely correct as the 'random' number generated is still pseudorandom (debatably) as you are just "whitening" the algorithm by introducing more randomness. True randomness is generated by pulling data from physical phenomena such as radiation and atmospheric noise, and in the case of /dev/random hardware events.

Pseudorandom number generators are technically hackable given (in the case of the mersenne twister) 624 consecutive numbers generated from the seed [https://github.com/kmyk/mersenne-twister-predictor/blob/master/mt19937predictor.py], however that is not an easily exploitable vulnerability in the context of encryption

belkarx
  • 1,207
  • 2
  • 18
  • wow i have never heard of mersenne twister or that there was a limit before everything would start to reproduce a pattern. i will have to read up in depth on them. what i meant was that the audio would be a live real time sample of a recording. that would be true random, correct? what goes on in ```/dev/random``` linux directory that makes it random? would capturing x y coordinates of cursor location be true random? would vulnerability of peudo random generators be fixed if the seed was more intensely pseudorandom like the aforementioned cursor concept? – mindoverflow Dec 25 '21 at 23:58
  • wait a minute. regardless of where time is pulled from, it is still cyclical as it would be a multiple of 24 or what have you depending on in what term the unit of time is being measured in. what i mean is let's say i represent time in units of 1 hours. then no matter what the time number is even with further convolution: (46 * 244 / 41); i would still know that 46 hours since generation has passed. – mindoverflow Dec 26 '21 at 00:10
  • 1
    As for your first point, it depends on the audio source. If you have noise that is predetermined, that makes it pseudorandom. `/dev/urandom` uses environmental noise (so physics-based), and has thresholds for what is "enough" randomness (the "entropy pool" has to be filled). The entire time value isn't cyclical by definition, though the lower bits are, and as long as the value differs, the values outputted by the PRNG should be "random". If you want to explore cycling, [here](https://www.desmos.com/calculator/qim5cc2uaa) is a representation of a basic linear congruential generator. – belkarx Dec 26 '21 at 01:29
2

True random is always better. Unfortunately, it is much harder. The commonly used technique is that at boot time, the system tries to gather as much random values as it can: the date and time (not random at all but it allways changes, and it is to guess at what time (with millesecond precision) the random subsystem asked it, optionally, it can try to gather random values from the network, from the disk, or from the operator if the system can only be manually booted.

From that, it initializes its own system seed. Application that do nor require reproducible sequences can just rely on that system sequence.

Giving a seed is mainly of interest for tests. If you want to control the output of an algorithm using random values, using a specific seed allows to always receive the same sequence, and because of that to alway get the same output value which can then be easily controled. But for production code, you should never give yourself a seed.

Serge Ballesta
  • 25,636
  • 4
  • 42
  • 84
  • You've brought up something very important that i was not paying attention to at all. Even if a seed was a random source allocated via srand() or the like, it is still not a good practice. I'll keep that in mind. thank you. I understood it, but my reasoning was only one dimensional in that srand() used system time, which isn't random. but even if the seed was random, we should not use a seed, correct? – mindoverflow Dec 24 '21 at 15:52
  • 1
    Millisecond precision? Modern x86 systems have sub-nanosecond precision in their `rdtsc` timestamps; the TSC frequency is (usually well) over 1GHz, and fine-grained so it doesn't skip counts and leave low bits always zero. I assume any modern embedded system would have at *least* microsecond timestamp capability, if not something equivalent to x86. (Of course, most embedded systems that need a CSPRNG should use some kind of true RNG source; the timing of bootup events tends to be very repeatable inside an embedded system so there's not much entropy before e.g. getting (encrypted) wifi up.) – Peter Cordes Dec 25 '21 at 06:54
1

In matters of security asking the right questions is extremely important, as is asking clear and precise questions. Learning how to ask these questions is difficult so I will try to restate your questions in more precise terms as well as answer some questions that I think will help further your understanding.

  1. Does the fact that the time-of-day seed overflows create a security vulnerability?

Basically no. I don't know of any modern system whose random seed overflows every 24 hours. Some old BASIC implementations did this. The main attack I can think of is to cause the program to run at the same time the next day, and you will get the same sequence of random numbers.

  1. Does the fact that the time-of-day is used as a seed create a security vulnerability?

Absolutely. If a program runs srand(time(NULL)) and you know it started 2 hours ago, plus or minus 2 minutes, you know that it got seeded with one of 240 possible values, and you can easily brute force all 240 values, and if you can observe a couple of outputs you can quickly determine which of these seeds was the actual one and how far along in the sequence it has gone.

If this program happens to be one that is generating keys based on that seed alone, you can re-run the program with each of those 240 values and one of them will give the exact same keys.

  1. So what if we give a "true" random seed to a pseudo-random generator?

This is definitely better but is not necessarily good enough. Some PRNGs are "stronger" than others, but if you observe enough outputs of the PRNG you can eventually deduce the internal entropy value and begin predicting future values. PRNGs are usually designed to be efficient to execute, rather than be cryptographically secure. However there are exceptions.

  1. If we select a pseudorandom location in an audio file, does that give us "true" randomness?

No. The audio file can be considered part of your seed. If you run the same program with the same PRNG seed and the same audio file you will get the same sequence of numbers. Alternatively you can consider the audio file as a "secret" part of your algorithm, which is then security through obscurity.

There's more. An audio file is not random at all (unless it's white noise!) If you exploit some other weakness in the system to guess the outputs of the PRNG, then you can start to build a picture of what is in this audio file, which will let you make guesses when you get a sample that is quite close to a previous sample, but also, you might be able to guess what is in the audio file. For example if you reconstruct enough to guess that it is a recording of a human voice this will make it easier to guess some of the statistical properties within the file.

Even if you don't know what the PRNG is doing, you will still be able to observe a non-uniform distribution in the values of the frequencies in the audio.

  1. OK, forget the audio file. Let's just say we have a giant table of 10 million truly random numbers that we got somewhere. And we use a PRNG to index into it. Is that random?

What advantage does that have over just using the numbers in the table in the original order? Are you hoping to be able to re-use the table for a longer period of time? The longer you use it for, the more likely some weakness in the PRNG will be exploited to your detriment. Better to have each entry in the table have a fixed lifespan: single use only. In this case adding a PRNG does not make it any more random. Rather than storing big tables which might be leaked, you might as well get the numbers from a hardware generator exactly when you need them, then discard them.

  1. exactly how vulnerable would it really leave systems if pseudorandom values are used?

This depends hugely on the precise combination of application, seed source, PRNG algorithm, and threat model. If the seed source is time(NULL), and the attacker has access to the binary, it's like leaving your keys in the front door. If you at least pull the seed out of /dev/random you might be making an attack infeasible, or it might just need the attacker to observe your program for a few weeks to get enough info to make some deductions.

If the attacker doesn't have your binary but correctly guesses you are using srand(time(NULL)) then they can potentially collect enough observations to make their move.

An attacker could potentially launch a DDoS attack or something to crash your server or cause more instances to spin up, thus guaranteeing that they get their PRNG seed from a reasonably predictable timestamp.

If, however, you use a PRNG that doesn't make it easy to infer the hidden state, seed it with hardware entropy rather than a timestamp, and use the numbers in a way that doesn't leak a lot of information, it will not be easy to attack.

As with all things security though, just one mistake can completely compromise you, which is why it is advisable not to rely on all those assumptions about your seed, PRNG and application when you can instead simply use a sequence of one-time random values derived from hardware entropy.

Artelius
  • 588
  • 2
  • 4
  • does it matter if i know when the seed was produced if i was trying to decrypt for example an 8 year old file? wouldn't it make things impractically convoluted by then? – mindoverflow Dec 26 '21 at 00:02
  • 1
    @mindoverflow Standard PRNG and encryption algorithms don't change very often. Even if you don't have the source code or binaries from the program that did the encryption, you can simply go through all the popular PRNGs and encryption algorithms of the time. It might help if you have several files and you know the decryption key for one of them. If you *do* have the source code or binaries, the fact that the file was encrypted 8 years ago is virtually irrelevant. – Artelius Dec 26 '21 at 22:01