Evaluate CDF of Student-t distribution

The goal of this codegolf is evaluating the cumulative distribution function of the student-t probability distribution. This is not trivial since there is no closed form and the function is dependent of two variables v and x. The distribution function is

enter image description here

For getting the CDF you could integrate this function from -inf to x. As you can see the evaluation of this function relies much on integration as you'd also have to evaluate nontrivial values of the gamma function.

Details

You can can write a program or function that accepts a two numbers xand v as input, where x is the position (must at least accept floats from -10 upto 10) at which the CDF is evaluated and v is the number of degrees of freedom. v is a positive integer (range 1 upto 1450) but it must be possible to set it to infinity (for v=inf the student-t distribution is equal to the normal distribution). But you can specify that e.g. v = -1 should denote infinity. The output consists must be a number between 0 and 1 and must have an accuracy of at least 4 decimal places. If your language does not support runtime inputs or arguments on program call you may assume that both arguments are stored in variables (or another 'memory unit')

Restrictions

You may not use special functions like the gamma function or probability distribution functions as well as numerical integration functions or libraries (as long as you do not include their source in your code which counts for the score) etc. The idea is using only basic math functions. I tried to make a list of what I consider as 'basic' but there might be others I forgot - if so please comment.

Explicitly allowed are

+ - / * 
% (modulo)
! (factorial for integers only)
&& || xor != e.t.c. (logical operators)
<< >> ^ e.t.c. (bitshift and bitwise operators)
** pow (powerfunction for floats like a^b including sqrt or and other roots)
sin cos tan arcsin arccos arctan arctan2
ln lb log(a,b)
rand() (generating pseudorandom numbers of UNIFORM distribution)

Scoring

Lowest number of bytes wins.

Entry

Please explain how your code works. Please show the output of your program for at least 5 different arguments. Among those should be the following three. For your own values please also provide 'exact' values (like from wolframalpha or what ever you prefer).

x = 0      v= (any)  (should return 0.5000)
x = 3.078  v= 1      (should return 0.9000
x = 2.5    v= 23     (should return 0.9900)

If possible, it would be great if you could provide some fancy plots of the values calculated with your program (the code for plotting does not count for the score).

(from Wikipedia): plots

flawr

Posted 2014-11-09T14:53:52.613

Reputation: 40 560

1Many languages natively support powerful numeric integration algorithms, which aren't "special functions like the gamma function or probability distribution functions etc.". These algorithms are implemented using only the basic mathematical operators you've listed. Is the use of such library functions allowed? – COTO – 2014-11-09T15:11:07.397

Thank you for pointing this out. No, I'd like the participants to program it 'from scratch', so they should implement the integration method in the program itself (it does not matter whether it is 'stolen' from a library, as long as it is included in the code and counts for the score.) – flawr – 2014-11-09T15:27:11.653

1So numerical integration is allowed provided it is included in the program? That shouldn't be too dificult, a simple riemann sum or trapezium rule can be implemented with a for loop, provided it starts at a low enough value. On the other hand, a polynomial approximation should be able to solve this to the required accuracy. Is that allowed? (What if I call it an "analytically derived" Taylor series, or even an "empirically derived" one?) – Level River St – 2014-11-09T17:31:27.960

Yes this is allowed, the goal is having the required accuracy on the whole given domain. I am looking forward to seeing different approaches! If you want you could even do the whole think with one big lookup table, but it has to be included in the bytecount=) – flawr – 2014-11-09T20:53:52.217

May we generate random numbers (like floats between 0 and 1)? – xnor – 2014-11-10T09:32:17.197

Good question, I didn't think about that but I think I'm gonna allow that, I guess you are after MC-methods, right? (Or is there any other 'shortcut' that makes it way easier to solve the challenge with random numbers than without?) – flawr – 2014-11-10T17:03:44.103

Does it have to be 4 significant digits (or is it 4 decimal places)? For example according to http://www.wolframalpha.com/input/?i=t+distribution+probabilities for x=2.5788 v=1500 I get 0.9950, the same answer to 4 significant digits as for v=inf.(table at http://mathworld.wolfram.com/Studentst-Distribution.html ). But for x=-2.5788 v=1450 I get 0.005048, which only matches the result for v=inf in its last digit. At x=-10 v=1450 I get 4.096E-23. The asymmetric number of decimal places required by current wording may require a datum at x=-10 rather than the obvious x=0 or x=-inf

– Level River St – 2014-11-10T18:33:24.470

This question has turned out harder than I thought. First I tried integrating (1+x^2/v)^(-v/2-.5) numerically in the ranges -9999...x and -9999...9999 and dividing one by the other (taking 9999 as an approximation for infinity and knowing that the gamma part would cancel out) but I ran into problems with machine limits. I have a way of calculating the gamma function in half-integer steps inductively. There's a complete formula at http://www.vosesoftware.com/ModelRiskHelp/index.htm not requiring integration, but it's pretty complicated.

– Level River St – 2014-11-10T18:47:45.950

@steveverrill Oh I didn't consider that, so would you suggest altering the question to 4 decimal places (E.g. If the exact value is 0.0001624... the acceptable answer would be 0.0002)? – flawr – 2014-11-10T18:51:33.613

@flawr I've gone ahead and edited it, and fixed a spelling mistake in the title too. Roll it back if you disagree, but I think it's better that way. I dont think centering the calculations on x=-10 was your intention, and v=1450 is more than enough to make this challenging. – Level River St – 2014-11-10T19:06:08.240

@steveverrill The changes are ok, I just didn't understand what you mean by the second part? Why do you mean by centering the calculations on x=-10? Would you suggest v=1450 as limit? – flawr – 2014-11-10T19:12:52.227

@flawr per my previous comment, around v=1450 is where the curve becomes indistinguishable from v=inf based on a precision of 4 decimal places, so now there shouldn't be a need for a limit on v. Gamma(1450) is a big number but just about manageable. It would be easier if it was lower :-) The regarding the x=-10, the point of my edit was to avoid the need for excessive precision at the low end of the x domain, so that part is solved too. I was just reiterating why I think the change is a good one. Because I'm embarrased about changing the rules on a question I'm planning to answer. – Level River St – 2014-11-10T19:30:25.857

Ah ok now I understand, well there are really some problems I didn't consider (e.g. the limits of the computers, I only thought about the algorithms in a mathematical way.) So no need to feel embarrassed, I am really glad you helped eliminating the 'impossible' aspects=) – flawr – 2014-11-10T19:36:13.420

Can I use a built-in value of π? – TwiNight – 2014-11-11T04:35:12.693

@TwiNight Yes you can, I'll add that. – flawr – 2014-11-11T13:56:14.870

My MC convergence rate essentially becomes 0. Can't get 4 digits ahhhhhh! I'm guessing this is a problem with Box Muller. – Vlo – 2014-11-18T22:18:27.607

Well it seems this thread is not getting flooded with answers, so feel free to post your approach anyway! – flawr – 2014-11-19T13:49:00.123

You can use Box Muller (or many of the more algorithmically complex but better methods) to generate normal i.i.d. variables using uniform random. Then you can use (http://stats.stackexchange.com/a/70270) to generate student-t random variables (should be an exact result). Generate a lot of these student-t and rely on almost-non-existant-convergence to find the density. I'm doing this in R and its loop performance is essentially 0. I figure an implementation in a faster language with couple billion samples could possibly get the solution fairly close (assuming normal r.v. gen is good...)

– Vlo – 2014-11-19T19:42:29.833