4

Is there any benefit in determining how random a given file, stream, signal is?

I guess this would be useful to determine

  1. If something is (poorly) encrypted
  2. To verify the proper encryption of a file, data (GCM, SALSA20, or unknown cipher)
  3. To detect that a covert signal is being sent

Assuming I'm the role of either "Bob" or "Mallory", how would I measure how random a given stream of 1's and 0's are? (using any cipher method)

makerofthings7
  • 50,090
  • 54
  • 250
  • 536
  • even though you intend it for security purposes, this is more of a comp-sci or math question – schroeder Nov 25 '15 at 01:23
  • 1
    If you could do this, how would #2 apply? Why do you think that measuring randomness on the result of a cryptographic algorithm would be a metric for security? – schroeder Nov 25 '15 at 01:23
  • 1
    http://programmers.stackexchange.com/questions/147134/how-should-i-test-randomness may have some applicable information on this subject. – mti2935 Nov 25 '15 at 01:26
  • 1
    @schroeder I think it might be a metric when observing blackbox or closed source implementations where the developer says "trust us". – makerofthings7 Nov 25 '15 at 01:35
  • 1
    But, how? Even random data might have patterns – schroeder Nov 25 '15 at 02:14
  • 1
    A simple method would be to just measure how compressible the data is. Purely random data should be completely unable to be compressed by any known compression function. More formally, you should look into measuring the entropy of the information https://en.wikipedia.org/wiki/Quantities_of_information and https://en.wikipedia.org/wiki/Entropy_(information_theory) – Steve Sether Nov 25 '15 at 02:44
  • 2
    Detecting whether data is random or not doesn't really tell you if it's encrypted though. Highly compressed data is going to be largely indistinguishable from compressed data. – Steve Sether Nov 25 '15 at 02:47
  • **Entropy** is what you are looking for. See https://en.wikipedia.org/wiki/Entropy_(computing) and https://en.wikipedia.org/wiki/Entropic_security – Mxsky May 18 '16 at 06:08

2 Answers2

1

Oh boy, philosophy time. What is randomness? Can randomness be measured? What is a signal anyway?

Intuitively we can say that Thing A is more random than Thing B if it has more higher entropy. Entropy is one of those concepts that's easy to define and talk about, but actually measuring it from a data set is a statistician's nightmare. If your RNG outputs aaaaaaa... either your RNG sucks lemons, or you hit the one in a 10-n chance. With only one data point, it's impossible to tell.

For any statistical analysis of randomness to be at all meaningful you need at least a million samples generated in a lab environment where you can prove that your RNG was properly seeded for each sample. And even then you have to be very careful about which statistical tests you use because they all have a bias towards certain types of patterns. Trying to draw any conclusions about the quality of the randomness with less rigor than this will be ... inconclusive at best.

Bottom line: measuring cryptographic entropy in practice is hard and best left to researchers.


So that's the theoretical answer. In practice you can get a very rough idea of the amount of randomness in a string or file using the methods mentioned in comments / other answers: run it through a compression tool (which is a sub-category of Kolmogorov complexity tests), or one of the entropy-estimating stats tests mentioned in @EdwardBarnard's answer.

This type of thing is probably good enough for distinguishing an encrypted file from a plaintext file, but probably couldn't tell the difference between a compressed file and and encrypted file, or for that matter between a properly encrypted file and a badly encrypted file.


As for detecting covert signals, you're kinda out of luck if you're looking for a general solution. Tell me which off-the-shelf statistical tests you're using, and I'll design a steganography method that fools it. To have any hope of detecting this, you need to know a fair amount about the statistical properties of both the carrier signal, and the expected hidden signal, and design a detection algorithm around this. And of course the more noise naturally occurring in the carrier signal, the more places for covert signals to hide.

Mike Ounsworth
  • 57,707
  • 21
  • 150
  • 207
0

The Wikipedia article Randomness tests answers both questions (benefits of measuring randomness; how to do so). As the OP comments point out, randomness is not useful for measuring the validity/security of an encryption.

The closest use related to cryptography would be in evaluating pseudo-random number generators or other sources of randomness.

Edward Barnard
  • 672
  • 6
  • 17