What's the chance that I'll win a door prize?

12

1

My local ACM chapter gives out door prizes to people who come to the meetings. You get an increased chance of winning if you solve the programming puzzle, however (but I always solve that puzzle). Thus, some people have 1 entry, while others have 2. But wait! The way the raffle program works is not by adding in another entry when someone solves the puzzle. Instead, it keeps track of the number of "lives" a person has, decrementing that if that person is chosen in each pass of it's random sampling algorithm. So it works like this:

Doorknob: 1.  xnor: 2.  Justin: 2.  Alex: 1.  Dennis: 2.

Then the program randomly chooses one of [Doorknob, xnor, Justin, Alex, Dennis], decrements the number (say it chooses Justin):

Doorknob: 1.  xnor: 2.  Justin: 1.  Alex: 1. Dennis: 2.

And repeats. If someone's number of "lives" goes to 0 (let's choose Justin again), they're removed from the list:

Doorknob: 1.  xnor: 2.  Alex: 1.  Dennis: 2.

This continues until there's one person left; that person is the winner.

Now the real question is, what was the probability that I would have won?


You will be given two inputs:

  • n. This is the number of people entered into the challenge
  • k. This is the number of people (of those n) who have 2 lives. This number always includes you.

So if I had a function p and called p(10, 5), that would be the probability of winning the prize where there are 10 people total, 5 of which only have 1 life, whereas 5 (including you) have 2 lives.


You are expected to output the probability of winning either exactly, or as a decimal. At any rate, answers must be accurate up to and including the 4th decimal place after the decimal point. Whether or not you round to that digit is up to you.

Your solution may be a randomized solution which outputs the answer to the 4th decimal place with high probability. You may assume that the built in RNG you use is truly random, and it must output the correct answer with at least 90% probability.

Furthermore, your code need only work for n, k <= 1000, although I did provide test cases larger than that for those curious.


Test Cases

Note: some of these are general formulae.

n,    k   |  output
----------+---------
1,    1   |  1
2,    2   |  0.5
2,    1   |  0.75
3,    1   |  11/18 = 0.611111111
1000, 1   |  0.007485470860550352
4,    3   |  0.3052662037037037
k,    k   |  1/k
n,    1   |  (EulerGamma + PolyGamma[1 + n])/n    (* Mathematica code *)
          |  (γ + ψ(1 + n))/n
10,   6   |  0.14424629234373537
300,  100 |  0.007871966408910648
500,  200 |  0.004218184180294532
1000, 500 |  0.0018008560286627948
---------------------------------- Extra (not needed to be a valid answer)
5000, 601 |  0.0009518052922680399
5000, 901 |  0.0007632938197806958

For another few checks, take p(n, 1) * n as follows:

n     |  output
------+---------
1     | 1
2     | 1.5 
3     | 1.8333333333333335
10    | 2.928968253968254
100   | 5.1873775176396215
-------------------------- Extra (not needed to be a valid answer)
100000| 12.090146129863305

Justin

Posted 2016-05-17T23:34:46.353

Reputation: 19 757

I'm no longer familiar with the tags on this site; if you think of a more appropriate tag, please edit away! – Justin – 2016-05-17T23:35:20.510

Closely related question on math.se: http://math.stackexchange.com/q/1669715/72616

– Justin – 2016-05-17T23:47:25.233

So, P(n,k) = ((k-1)/n)P(n,k-1) + ((n-k)/n)P(n-1,k) + (1/n)Q(n,k-1), where Q(n,k) = ((n-k-1)/n)Q(n-1,k) + (k/n)Q(n,k-1) and Q(1,0)=1... – Leaky Nun – 2016-05-18T00:05:47.137

@KennyLau I'm not going to attempt to interpret it, but beware the math.se link, as it uses a slightly different definition of the function (I believe k is off by one) – Justin – 2016-05-18T00:06:53.523

Instead of EulerGamma.... it would be clearer to write H(n)/n where H(n) = harmonic numbers. – feersum – 2016-05-18T01:07:23.237

2Is it OK to do a randomized simulation with enough trials that the answer is correct to the fourth decimal place with high probability, though of course not certainty? – xnor – 2016-05-18T01:53:30.517

If only the lives were randomly chosen then the answer would be 2/(n+k)... – Neil – 2016-05-18T12:33:05.560

@xnor I think that should be reasonable – Justin – 2016-05-19T00:27:54.447

Rounding error in p(n,1)*n? "1.833333333335" shouldnt it be "1.8333333333333" – ev3commander – 2016-05-19T00:40:24.587

@BlockCoder1392 I didn't worry about that because it's past the 4th decimal place. But yes, you are right – Justin – 2016-05-19T00:49:36.377

This question is for a challenge I think? Math.SE is not PPCG.SE. – Erik the Outgolfer – 2016-05-21T16:21:41.653

Answers

2

MATL, 42 bytes

:<~QXJx`J`tf1Zry0*1b(-tzq]f1=vts3e8<]6L)Ym

This uses a probabilistic (Monte Carlo ) approach. The experiment is run a large number of times, from which the probability is estimated. The number of realizations is selected to ensure that the result is correct up to the fourth decimal with probability at least 90%. However, this takes a very long time and a lot of memory. In the link below the number of realizations has been reduced by a factor of 106 so that the program ends in a reasonable amout of time; and only the first decimal is guaranteed to be accurate with at least 90% probability.

EDIT (July 29, 2016): due to changes in the language, 6L needs to be replaced by 3L. The link below incorporates that modification.

Try it online!

Background

Let p denote the probability to be computed. The experiment described in the challenge will be run for a number n of times. Each time, either you win the prize (“success”) or you don't. Let N be the number of successes. The desired probability can be estimated from N and n. The larger n is, the more accurate the estimation will be. The key question is how to select n to fulfill to the desired accuracy, namely, to assure that at least 90% of times the error will be less than 10−4.

Monte Carlo methods can be

  • Fixed-size: a value of n is fixed in advance (and then N is random);
  • Variable-size: n is determined on the fly by the simulation results.

Among the second category, a common used method is to fix N (desired number of successes) and keep simulating until that number of successes is achieved. Thus n is random. This technique, called inverse binomial sampling or negative-binomial Monte Carlo, has the advantage that the accuracy of the estimator can be bounded. For this reason it will be used here.

Specifically, with negative-binomial Monte Carlo x = (N−1)/(n−1) is an unbiased estimator of p; and the probability that x deviates from p by more than a given ratio can be upper-bounded. According to equation (1) in this paper (note also that the conditions (2) are satisfied), taking N = 2.75·108 or larger ensures that p/x belongs to the interval [1.0001, 0.9999] with at least 90% probability. In particular, this implies that x is correct up to the 4th decimal place with at least 90% probability, as desired.

Code explained

The code uses N = 3e8 to save one byte. Note that doing this many simulations would take a long time. The code in the link uses N = 300, which runs in a more reasonable amount of time (less than 1 minute in the online compiler for the first test cases); but this only assures that the first decimal is correct with probability at least 90%.

:        % Take k implicitly. Range [1 ... k]
<~       % Take n implicitly. Determine if each element in the previous array is
         % less than or equal than n
Q        % Add 1. This gives an array [2 ... 2 1 ... 1]
XJx      % Copy to clipboard J. Delete from stack
`        % Do...while. Each iteration is a Monte Carlo realization, until the 
         % desired nunber of successes is reached
  J      %   Push previously computed array [2 ... 2 1 ... 1]
  `      %   Do...while. Each iteration picks one door and decrements it, until
         %   there is only one
    t    %     Duplicate
    f    %     Indices of non-zero elements of array
    1Zr  %     Choose one of them randomly with uniform distribution
    y0*  %     Copy of array with all values set to 0
    1b(  %     Assign 1 to chosen index
    -    %     Subtract
    tzq  %     Duplicate. Number of nonzero elements minus 1. This is falsy if
         %     there was only one nonzero value; in this case the loop is exited
  ]      %   End do...while
  f1=    %   Index of chosen door. True if it was 1 (success), 0 otherwise
  v      %   Concatenate vertically to results from previous realizations
  ts3e8< %   Duplicate. Is the sum less than 3e8? If so, the loop is exited
]        % End do...while
6L)      % Remove last value (which is always 1)
Ym       % Compute mean. This gives (N-1)/(n-1). Implicitly display

Luis Mendo

Posted 2016-05-17T23:34:46.353

Reputation: 87 464

Haha I didn't realize that 90% would make it that difficult :-) – Justin – 2016-05-21T16:07:44.313

Yes, fourth decimal with 90% confidence is a really strong requirement :-) – Luis Mendo – 2016-05-21T16:10:17.380

2

Pyth, 34 bytes

Mc|!*HJ-GHch*J+*tHgGtH*gtGHKh-GHKG

Test suite

Defines a deterministic memoized recursive function g taking n, k as arguments. g 1000 500 returns 0.0018008560286627952 in about 18 seconds (not included in the above test suite because it times out the online interpreter).

An approximate Python 3 translation would be

@memoized
def g(n,k):
    return (not k*(n-k) or (1+(n-k)*((k-1)*g(n,k-1)+g(n-1,k)*(n-k+1)))/(n-k+1))/n

Anders Kaseorg

Posted 2016-05-17T23:34:46.353

Reputation: 29 242

1

JavaScript (ES6), 65 bytes

f=(n,k,d=n-k)=>(!d||(f(n-1,k)*++d*--d+1+(--k&&f(n,k)*k*d))/++d)/n

Don't try it with the big numbers though; even f(30, 10) takes a noticeable amount of time.

Neil

Posted 2016-05-17T23:34:46.353

Reputation: 95 035