1

I work in security and I've seen modulus (modulo) used in many encoding and crypto algorithms. However, today, a friend of mine mentioned that using modulo like this:

unsigned long int result = some_CSPRNG_output % 556600;

"Limits the security effectiveness of the CSPRNG."

If you are not familiar with C, the pseudo-code there is essentially stating that the output of some cryptographically secure random number generator, such as /dev/urandom on a linux system, modulo 556600, squeezed into a positive integer, is assigned to the variable "result."

The idea behind his argument is that modulus limits the number of possible outputs, therefore weakening the strength of the entropy. He stated for example, that if we have a CSPRNG output and we compute it with modulo 2, there are fewer possibilities of outcome, thus the entropy is weaker.

Is this true? Please explain why or why not.

the_endian
  • 1,009
  • 1
  • 8
  • 17

1 Answers1

1

The modulo operation is useful to obtain a number within a certain range. In your example, the output will always be in [0, 556600).

This can cause some changes in the distribution of the random numbers generated:

The main one: a smaller pool of possible values:

If the original number of bytes read out of /dev/urandom is more than ~2.5, then it had a bigger range and higher entropy than the resulting number due simply to the fact that the range is smaller.

The extreme example is modulo 2. You can go from a massive range of possible values to just 0 or 1.

This in itself is not a problem. If you need to randomly pick between two choices, this is the right thing to do. If anything, this will smooth out some imbalances in the PRNG.

The nasty one: a non-uniform distribution:

One nasty side effect of modulus is when the size of the range is not a multiple of the modulus.

eg: % 2 on the range [0, 2]. Two of the original possible values map to 0 while only one maps to 1. If the original distribution is uniform, the final distribution is massively skewed towards 0.

The bigger the range is compared to the modulus, the less of a bias this causes.

In your original example, if we initially read 4 bytes from /dev/urandom, the range is 7716.43 times the modulus (2^32 / 556600) which is not a multiple, but is large enough to be less impactful. Whether the bias is useful will depend on the application.

This part can be problematic: if you need to pick between two choices and you pick the first one 66% of the time, you have a problem.

Summary:

Your friend was partially correct in saying that the entropy changes through a modulo operation, but for the wrong reason. Limiting the number of possible values does not necessarily change the entropy. But biasing the final distribution might.

Marc
  • 4,091
  • 1
  • 17
  • 23