Generalized Birthday Problem

12

2

Tonight, my fiancée took me out to dinner to celebrate my birthday. While we were out, I heard Happy Birthday sung to 5 different guests (including myself), in a restaurant full of 50 people. This got me wondering - the original birthday problem (finding the probability that 2 people in a room of N people share the same birthday) is very simple and straightforward. But what about calculating the probability that at least k people out of N people share the same birthday?

In case you're wondering, the probability of at least 5 people out of 50 total people sharing the same birthday is about 1/10000.

The Challenge

Given two integers N and k, where N >= k > 0, output the probability that at least k people in a group of N people share the same birthday. To keep things simple, assume that there are always 365 possible birthdays, and that all days are equally likely.

For k = 2, this boils down to the original birthday problem, and the probability is 1 - P(365, N)/(365)**N (where P(n,k) is the number of k-length permutations formed from n elements). For larger values of k, this Wolfram MathWorld article may prove useful.

Rules

  • Output must be deterministic, and as accurate as possible for your chosen language. This means no Monte Carlo estimation or Poisson approximation.
  • N and k will be no larger than the largest representable integer in your chosen language. If your chosen language has no hard maximum on integers (aside from memory constraints), then N and k may be arbitrarily large.
  • Accuracy errors stemming from floating-point inaccuracies may be ignored - your solution should assume perfectly-accurate, infinite-precision floats.

Test Cases

Format: k, N -> exact fraction (float approximation)

2, 4 -> 795341/48627125 (0.016355912466550306)
2, 10 -> 2689423743942044098153/22996713557917153515625 (0.11694817771107766)
2, 23 -> 38093904702297390785243708291056390518886454060947061/75091883268515350125426207425223147563269805908203125 (0.5072972343239854)
3, 3 -> 1/133225 (7.5060987051979735e-06)
3, 15 -> 99202120236895898424238531990273/29796146005797507000413918212890625 (0.0033293607910766013)
3, 23 -> 4770369978858741874704938828021312421544898229270601/375459416342576750627131037126115737816349029541015625 (0.01270542106874784)
3, 88 -> 121972658600365952270507870814168157581992420315979376776734831989281511796047744560525362056937843069780281314799508374037334481686749665057776557164805212647907376598926392555810192414444095707428833039241/238663638085694198987526661236008945231785263891283516149752738222327030518604865144748956653519802030443538582564040039437134064787503711547079611163210009542953054552383296282869196147657930850982666015625 (0.5110651106247305)
4, 5 -> 1821/17748900625 (1.0259790386313012e-07)
4, 25 -> 2485259613640935164402771922618780423376797142403469821/10004116148447957520459906484225353834116619892120361328125 (0.0002484237064787077)
5, 50 -> 786993779912104445948839077141385547220875807924661029087862889286553262259306606691973696493529913926889614561937/7306010813549515310358093277059651246342214174497508156711617142094873581852472030624097938198246993124485015869140625 (0.00010771867165219201)
10, 11 -> 801/8393800448639761033203125 (9.542757239717371e-23)
10, 20 -> 7563066516919731020375145315161/4825745614492126958810682272575693836212158203125 (1.5672327389589693e-18)
10, 100 -> 122483733913713880468912433840827432571103991156207938550769934255186675421169322116627610793923974214844245486313555179552213623490113886544747626665059355613885669915058701717890707367972476863138223808168550175885417452745887418265215709/1018100624231385241853189999481940942382873878399046008966742039665259133127558338726075853312698838815389196105495212915667272376736512436519973194623721779480597820765897548554160854805712082157001360774761962446621765820964355953037738800048828125 (1.2030611807765361e-10)
10, 200 -> 46037609834855282194444796809612644889409465037669687935667461523743071657580101605348193810323944369492022110911489191609021322290505098856358912879677731966113966723477854912238177976801306968267513131490721538703324306724303400725590188016199359187262098021797557231190080930654308244474302621083905460764730976861073112110503993354926967673128790398832479866320227003479651999296010679699346931041199162583292649095888379961533947862695990956213767291953359129132526574405705744727693754517/378333041587022747413582050553902956219347236460887942751654696440740074897712544982385679244606727641966213694207954095750881417642309033313110718881314425431789802709136766451022222829015561216923212248085160525409958950556460005591372098706995468877542448525403291516015085653857006548005361106043070914396018461580475651719152455730181412523297836008507156692430467118523245584181582255037664477857149762078637248959905010608686740872875726844702607085395469621591502118462813086807727813720703125 (1.21685406174776e-07)

Mego

Posted 2016-06-10T03:31:39.640

Reputation: 32 998

9Happy birthday (belated)! – Luis Mendo – 2016-06-10T09:22:23.080

Maybe add a couple of test cases for small numbers? – Luis Mendo – 2016-06-10T09:46:57.353

@LuisMendo I will add some more after I get a few hours of sleep :) – Mego – 2016-06-10T09:47:55.467

6It's worth noting that the probability that people eat at a restaurant is probably not independent of whether it's their birthday, so the probability of five birthdays out of 50 people is probably higher than the Birthday Problem logic would suggest. – Glen O – 2016-06-10T14:47:24.740

@GlenO Good point! – Luis Mendo – 2016-06-10T16:39:19.603

Is the input format in the same order? (n, k) = (2, 4) would be 0, but (k, n) = (2, 4) is ~0.016. – miles – 2016-06-11T10:42:24.123

@miles Good catch, thanks! Not sure how I got those flipped. – Mego – 2016-06-11T11:08:49.790

Answers

3

Jelly, 17 16 bytes

ĠZL
365ṗÇ€<¬µS÷L

Extremely inefficient. Try it online! (but keep N below 3)

How it works

365ṗÇ€<¬µS÷L  Main link. Left argument: N. Right argument: K

365ṗ          Cartesian product; generate all lists of length N that consist of
              elements of [1, ..., 365].
    ǀ        Map the helper link over all generated lists. It returns the highest
              amount of people that share a single birthday.
      <       Compare each result with K.
       ¬      Negate.
        µS÷L  Take the mean by dividing the sum by the length.


ĠZL           Helper link. Argument: A (list of integers)

Ġ             Group the indices have identical values in A.
 Z            Zip; transpose rows with columns.
  L           Take the length of the result, thus counting columns.

Dennis

Posted 2016-06-10T03:31:39.640

Reputation: 196 637

1"keep N below 3"... isn't that overly restrictive? – Neil – 2016-06-10T07:46:50.253

2@Neil The solution is valid for all inputs, but the online interpreter won't be able to run inputs where N > 3, due to memory and time constraints. – Mego – 2016-06-10T08:59:50.713

@Mego I was just thinking that because it doesn't make much sense if you don't have k > 1, then given k <= N, if you then want to keep N < 3, that doesn't leave much choice for the values of N and k that you can try. – Neil – 2016-06-16T10:41:54.273

4

MATL, 16 bytes

365:Z^!tXM=s>~Ym

First input is N, second is k.

Try it online!

This is an enumeration-based approach, like Dennis' Jelly answer, so input numbers should be kept small due to memory limitations.

365:   % Vector [1 2 ... 365]
Z^     % Take N implicitly. Cartesian power. Gives a 2D array with each
       % "combination" on a row
!      % Transpose
t      % Duplicate
XM     % Mode (most frequent element) of each column
=      % Test for equality, element-wise with broadcast. For each column, gives
       % true for elements equal to that column's mode, false for the rest
s      % Sum of each column. Gives a row vector
>~     % Take k implicitly. True for elements equal or greater than k
Ym     % Mean of each column. Implicitly display

Luis Mendo

Posted 2016-06-10T03:31:39.640

Reputation: 87 464

2You outgolfed Dennis, good job. – m654 – 2016-06-10T11:34:09.953

4@m654 Let's see when he wakes up :-D – Luis Mendo – 2016-06-10T11:54:10.837

2Well, I woke up, but the best I managed was a tie. Jelly really needs a mean atom... – Dennis – 2016-06-10T15:45:56.870

@Dennis I was thinking the same. Maybe a mode atom too? – Luis Mendo – 2016-06-10T16:31:05.040

0

J, 41 36 bytes

(+/%#)@(<:365&(#~>./@(#/.~)@#:i.@^))

Straight-forward approach similar to the others. Runs into memory issues at n > 3.

Usage

Takes the value of k on the LHS and n on the RHS.

   f =: (+/%#)@(<:365&(#~>./@(#/.~)@#:i.@^))
   0 f 0
0
   0 f 1
1
   1 f 1
1
   0 f 2
1
   1 f 2
1
   2 f 2
0.00273973
   0 f 3
1
   1 f 3
1
   2 f 3
0.00820417
   3 f 3
7.5061e_6

On my pc, using an i7-4770k and the timer foreign 6!:2, computing for n = 3 requires about 25 seconds.

   timer =: 6!:2
   timer '2 f 3'
24.7893
   timer '3 f 3'
24.896

Explanation

(+/%#)@(<:365&(#~>./@(#/.~)@#:i.@^)) Input: k on LHS, n on RHS
          365&                       The number 365
               #~                    Create n copies of 365
                                 ^   Calculate 365^n
                              i.@    The range [0, 1, ..., 365^n-1]
                            #:       Convert each value in the range to base-n and pad
                                     with zeroes to the right so that each has n digits
                     (#/.~)@         Find the size of each set of identical values
                 >./@                Find the max size of each
        <:                           Test each if greater than or equal to k
(+/%#)@                              Apply to the previous result
 +/                                  Find the sum of the values
    #                                Count the number of values
   %                                 Divide the sum by the count and return

miles

Posted 2016-06-10T03:31:39.640

Reputation: 15 654