3

I'm in the process of setting up a local (i.e. offline and very limited) business, and I'm thinking of generating invoice IDs randomly to avoid the clients knowing that they're customer number #00000001 (and because I prefer something like #30549805 to CLIENT1). I've come across the following script to do this:

#!/usr/bin/env bash

digits=8

rand=$(od -A n -t d -N 2 /dev/urandom |tr -d ' ')
num=$((rand % 10))
while [ ${#num} -lt $digits ]; do
  rand=$(od -A n -t d -N 1 /dev/urandom |tr -d ' ')
  num="${num}$((rand % 10))"
done
echo $num

...and it seems to work well enough: returning 26 duplicates (13 pairs) in more than 55,000 numbers.

Would it be safe to use something like this to generate invoices, and are there any disadvantages to doing so?

Assuming it is safe, what's the lowest amount of digits I can make the ID before the odds of collisions would be too high?

Hashim Aziz
  • 969
  • 8
  • 21
  • 3
    It seems to me that a collision is more a functional issue than a security fault. I suggest you generate a new I'd, then check it is free before assigning. – Pedro Jun 23 '20 at 17:49
  • 1
    I'm pretty sure that there should never be a duplicate invoice number. Which means any chance of collision would not be acceptable. As Pedro suggested: don't count on some probability but actually check if the number was used before. – Steffen Ullrich Jun 23 '20 at 17:54
  • 1
    @Pedro It occurred to me that this would be possible after the answer. I suppose this would require adding all generated digits into a "database" (a text file) and then checking that before adding a new one? – Hashim Aziz Jun 23 '20 at 18:09
  • That is a way of doing it. Unless you need to generate a lot of new invoices very quickly, you don't need to worry about the process scaling very well. But you can certainly improve it. – Pedro Jun 23 '20 at 21:03

1 Answers1

4

Here's a one-liner that you can use to generate a random string of digits:

head /dev/urandom | tr -dc '[:digit:]' | cut -c 1-10

This creates a random string of 10 digits. To increase or decrease the length, adjust the 10 at the end to the desired length.

To answer your second question, 'what's the lowest amount of digits I can make the ID before the odds of collisions would be too high?' - this is known as the birthday problem, which famously shows that if 23 people are in a room, there is a 50% chance that two people in the room share the same birthday. The math behind this problem can be used to solve your problem as well. Fortunately, you don't have to do any complicated math, because it's already done for you in the above wikipedia article. If you scroll down to the table in section 2.5, you'll find different scenarios that apply in your case.

For example, if you use random strings of 12 digits in length (a space just shy of 40 bits), you would have a 50% chance of getting a duplicate random string after generating a little less than 1.2x10^6 random strings. If you increase the length to 20 digits (a space greater than 64 bits), the number of random strings that you would have to generate before having a 50% chance of a duplicate increases to over 5.1x10^9.

mti2935
  • 19,868
  • 2
  • 45
  • 64
  • Thanks for the one-liner, I''ll run some tests to see whether it's as random as the script. I've come across the birthday problem in my research for this question, but this answer doesn't make it any clearer to me and my circumstances, nor does it explicitly answer any of my questions in my post. Specifically, what the lowest amount of digits I can go to before collisions get too high. This answer only seems to look at *higher* digit spaces than I want to deal with. It would be much improved if it included the formula to calculate the number of collisions for a given number of digits. – Hashim Aziz Jun 23 '20 at 18:07
  • 1
    To answer the question, `what is the lowest number of digits I can go to before the collisions get to high`, you need to decide: 1) what is the probability of a collision that you are willing to tolerate, and 2) after generating how many random strings? If you respond with these values, I'll show you how you can use the table that I referenced in my answer to determine how long the random strings should be to satisfy these requirements. However, having said that, I agree with Pedro and Steffen that your application should check for duplicates to be sure. – mti2935 Jun 23 '20 at 18:22
  • Since I'm now going to be developing a solution that checks for collisions, I think I can get away with keeping the number of collisions relatively high, so anything less than 50% should be fine (I want my code to successfully generate more often than it collides), as should the 50,000 runs I've been using to test. Kudos to your one-liner btw, I even found it slightly better in only generating 9 dupes compared to my current script's 13. – Hashim Aziz Jun 23 '20 at 22:04
  • @Prometheus Glad the one-liner works for you. If you look at the chart at https://en.wikipedia.org/wiki/Birthday_problem, you can see that you'll have a 50% chance of a collision after generating 7.7x10^4 (77,000) random numbers of 32 bits in length A length of 32 bits equates to a length of 10 decimal digits. So, you can choose a length of 10 for your application, and have less than a 50% chance of a dup after generating 50,000 random strings of 10 digits in length. – mti2935 Jun 23 '20 at 23:38