Background

The birthday paradox is a popular problem in probability theory which defies (most people's) mathematical intuition. The problem statement is:

Given N people, what is the probability that at least two of them have the same birthday (disregarding the year).

The problem is usually simplified by ignoring leap days entirely. In this case, the answer for N = 23 is P(23) ≈ 0.5072972 (as a common example). The linked Wikipedia article explains how to arrive at this probability. Alternatively, this Numberphile video does a really good job.

However, for this challenge we want to do it right and don't ignore leap years. This is slightly more complicated, since now the 29th of February needs to be added, but this particular birthday is less likely than all the others.

We'll also use the full leap year rules:

If a year is divisible by 400 it's a leap year.
Else, if a year is divisible by 100 it's not a leap year.
Else, if a year is divisible by 4 it's a leap year.
Else, it's not a leap year.

Confused? It means that the years 1700, 1800, 1900, 2100, 2200, 2300 are not leap years, but 1600, 2000, 2400 are (as well as any other year divisible by 4). This calendar repeats every 400 years, and we will assume a uniform distribution of birthdays over those 400 years.

The corrected result for N = 23 is now P(23) ≈ 0.5068761.

The Challenge

Given an integer 1 ≤ N < 100, determine the probability that among N people at least two have the same birthday under consideration of the leap year rules. The result should be a floating-point or fixed-point number, accurate to at least 6 decimal places. It is acceptable to truncate trailing zeroes.

You may write a program or function, taking input via STDIN (or closest alternative), command-line argument or function argument and output the result via STDOUT (or closest alternative), function return value or function (out) parameter.

Your solution must be able to produce output for all 99 inputs in a matter of seconds. This is mainly to rule out Monte Carlo methods with tons of samples, so if you're using a principally fast and exact algorithm in an excessively slow esoteric language, I'm willing to give leeway on this rule.

Test Cases

Here is the full table of results:

 1 => 0.000000
 2 => 0.002737
 3 => 0.008195
 4 => 0.016337
 5 => 0.027104
 6 => 0.040416
 7 => 0.056171
 8 => 0.074251
 9 => 0.094518
10 => 0.116818
11 => 0.140987
12 => 0.166844
13 => 0.194203
14 => 0.222869
15 => 0.252642
16 => 0.283319
17 => 0.314698
18 => 0.346578
19 => 0.378764
20 => 0.411063
21 => 0.443296
22 => 0.475287
23 => 0.506876
24 => 0.537913
25 => 0.568260
26 => 0.597796
27 => 0.626412
28 => 0.654014
29 => 0.680524
30 => 0.705877
31 => 0.730022
32 => 0.752924
33 => 0.774560
34 => 0.794917
35 => 0.813998
36 => 0.831812
37 => 0.848381
38 => 0.863732
39 => 0.877901
40 => 0.890932
41 => 0.902870
42 => 0.913767
43 => 0.923678
44 => 0.932658
45 => 0.940766
46 => 0.948060
47 => 0.954598
48 => 0.960437
49 => 0.965634
50 => 0.970242
51 => 0.974313
52 => 0.977898
53 => 0.981043
54 => 0.983792
55 => 0.986187
56 => 0.988266
57 => 0.990064
58 => 0.991614
59 => 0.992945
60 => 0.994084
61 => 0.995055
62 => 0.995880
63 => 0.996579
64 => 0.997169
65 => 0.997665
66 => 0.998080
67 => 0.998427
68 => 0.998715
69 => 0.998954
70 => 0.999152
71 => 0.999314
72 => 0.999447
73 => 0.999556
74 => 0.999645
75 => 0.999717
76 => 0.999775
77 => 0.999822
78 => 0.999859
79 => 0.999889
80 => 0.999913
81 => 0.999932
82 => 0.999947
83 => 0.999959
84 => 0.999968
85 => 0.999976
86 => 0.999981
87 => 0.999986
88 => 0.999989
89 => 0.999992
90 => 0.999994
91 => 0.999995
92 => 0.999996
93 => 0.999997
94 => 0.999998
95 => 0.999999
96 => 0.999999
97 => 0.999999
98 => 0.999999
99 => 1.000000

(Of course, P(99) is only 1.0 due to rounding. The probability won't reach exactly 1.0 until P(367).)

Martin Ender

Posted 2015-04-24T13:39:51.383

Reputation: 184 808

If you're going to be pedantic then you should take into account that birthdays aren't uniformly distributed throughout the year. 2. The precise relevance of the leap year rules depends on what assumptions are made about human longevity. Is the idea to amortise over the full 400 year cycle?

< – Peter Taylor – 2015-04-24T15:21:11.813

1@PeterTaylor Yes, assume uniform distribution over the full 400 year cycle. I never said the set of N people was alive at the same time. ;) – Martin Ender – 2015-04-24T15:22:07.850

Answers

Pyth, 31 34 bytes

Jt.2425K366-1c.xX0rK-KQ*JQ^+KJQ

Demonstration, Test Harness

This works similarly to the old version, except that instead of separately generating the (366 + n * (.2425 - 1)) value and multiplying it on, it starts by generating a list which extends from 366 to 365 - n + 2, then modifies the 366 in place to become (366 + n * (.2425 - 1)), and then takes the product of the list. Also, the constants 366 and -.7575 are used instead of 365 and .2425.

Old version:

Pyth, 34 bytes

J.2425K365-1c*+hK*QtJ.xrK-hKQ^+KJQ

There are two possible ways for there to be no pair of people with the same birthday: everyone to have different birthdays, and no one has a birthday on February 29th, and someone to have a birthday on the 29th, and everyone else to have different birthdays on normal days.

The probability of the first occurring is (365 * 364 * ... 365 - n + 1) / (365.2425^n).

The probability of the second occurring is (365 * 364 * ... 365 - n + 2) * .2425 * n / (365.2425^n)

These can be written together as (365 * 364 * ... 365 - n + 2) * (365 - n + 1 + .2425 * n) / (365.2425^n) = (365 * 364 * ... 365 - n + 2) * (365 + 1 + (.2425 - 1) * n) / (365.2425^n).

This is the probability of no pairs, so the probability of at least one pair is one minus the above number.

J = .2425
K = 365
.xrK-hKQ = (365 * 364 * ... 365 - n + 2)
+hK*QtJ = (365 + 1 + n * (.2425 - 1))
^+KJQ = (365.2425 ^ n)

isaacg

Posted 2015-04-24T13:39:51.383

Reputation: 39 268

Python, 179 178 144 143 140 136 135 133

f=.2425
g=365+f
a=lambda n:(n and(365-n)*a(n-1)or 365)/g
b=lambda n:(n<2and f or(367-n)*b(n-1)+a(n-2)*f)/g
p=lambda n:1-a(n-1)-b(n)

p(n) gives the result. Change .2425 to fractions.Fraction(97,400) to get an exact result.

orlp

Posted 2015-04-24T13:39:51.383

Reputation: 37 067

You don't need a space between 2 and and. – isaacg – 2015-04-24T19:25:04.787

Can you not put in 1/ for g and divide by it instead? – xnor – 2015-04-24T21:42:45.990

@xnor Yep, over time these things get lost :) What once was an optimization becomes suboptimal later. – orlp – 2015-04-24T22:27:49.777

you could introduce e=365 and replace 365 by e and 367 by e+2 – Willem – 2015-04-25T07:26:37.577

@willem That's not shorter. – orlp – 2015-04-25T08:01:30.593

@orlp you're right, its still the same length – Willem – 2015-04-25T08:17:35.417

you can remove the space here ) or – Willem – 2015-04-25T08:30:43.593

Reordering (365-n)*a(n-1) cuts another space and likewise for (367-n)*b(n-1). – xnor – 2015-04-25T09:28:55.047

Python 155 153 151 142 140 bytes

d=146097
b=d/400
c=97/d
e=lambda n:n<2and 1-97/d or e(n-1)*(366-n)/b
f=lambda n:n<2and c or f(n-1)*(367-n)/b+e(n-1)*c
a=lambda n:1-e(n)-f(n)

Call a(n) for the result. For exact results, wrap d in a fraction.

Test here

Uses same technique as here, but with modified constants.

TheNumberOne

Posted 2015-04-24T13:39:51.383

Reputation: 10 855

You don't need a space between 2 and and. – isaacg – 2015-04-24T19:25:17.130

I thought it was 98 (although I may have mad a calculation mistake...) – Tim – 2015-04-26T14:45:11.857

80386 machine code, 43 bytes

Hexdump of the code:

68 75 6e 33 3b 68 5a eb 07 3b 8b c4 49 d9 e8 d9
e8 d8 00 d9 e8 d9 40 04 de ea d8 c9 d9 00 de eb
e2 f3 d8 ca d9 e8 d8 e1 58 58 c3

I started from the following formula (for the complementary probability):

$\prod\limits_{i=0}^{k-2}(1-\frac{97+400*i}{d})*(1-\frac{303*(k-1)}{d})$

(here d = 366 * 400 - 303 is the number of days in 400 years)

Here is c++ code that implements it (it's already optimized a little):

double it_opt(int k)
{
    double d = 366 * 400 - 303; // number of days in 400 years
    double result = 1; // probability of no coincidences
    const float const1 = float(400 / d);
    const float const2 = float(303 / d);
    double v1 = 1 + const2;
    double v2 = 1;

    for (int i = 0; i < k - 1; ++i)
    {
        v1 -= const1;
        result *= v1;
        v2 -= const2;
    }
    result *= v2;
    return 1 - result;
}

The code is arranged so it requires the minimum number of constants - only two (400 / d and 303 / d). I use the float type to represent them because it occupies less space (4 bytes per constant). In addition, I didn't want to multiply const2 by k - 1 (because that would require converting k - 1 to float); the code subtracts const2 repeatedly instead.

Here is the assembly language listing:

    ; // fastcall convention - parameter k is in ecx
    ; // result must be returned in st
    push dword ptr 0x3b336e75; // const1 = [esp + 4]
    push dword ptr 0x3b07eb5a; // const2 = [esp]
    mov eax, esp;              // use [eax] instead of [esp] - 1 byte less
    dec ecx;                   // calculate k - 1
    fld1;                      // initiaze result = 1
    fld1;                      // initialize v1
    fadd [eax];
    fld1;                      // initialilze v2
myloop:
    fld dword ptr [eax + 4];
    fsubp st(2), st;            // update v1
    fmul st, st(1);             // update result
    fld dword ptr [eax];
    fsubp st(3), st;            // update v2
    loop myloop;                // loop
    fmul st, st(2);             // update result by v2
    fld1;
    fsub st, st(1);             // complement result
    pop eax;                    // restore stack
    pop eax;                    // restore stack
    ret;                        // return

anatolyg

Posted 2015-04-24T13:39:51.383

Reputation: 10 719

(Somewhat) Pedantic Birthday Paradox

Background

The Challenge

Test Cases

Answers

Pyth, 31 34 bytes

Python, 179 178 144 143 140 136 135 133

Python 155 153 151 142 140 bytes

80386 machine code, 43 bytes