7
1
The goal of this codegolf is evaluating the cumulative distribution function of the student-t probability distribution. This is not trivial since there is no closed form and the function is dependent of two variables v
and x
. The distribution function is
For getting the CDF you could integrate this function from -inf
to x
. As you can see the evaluation of this function relies much on integration as you'd also have to evaluate nontrivial values of the gamma function.
Details
You can can write a program or function that accepts a two numbers x
and v
as input, where x
is the position (must at least accept floats from -10 upto 10) at which the CDF is evaluated and v
is the number of degrees of freedom. v
is a positive integer (range 1
upto 1450
) but it must be possible to set it to infinity (for v=inf
the student-t distribution is equal to the normal distribution). But you can specify that e.g. v = -1
should denote infinity. The output consists must be a number between 0 and 1 and must have an accuracy of at least 4 decimal places.
If your language does not support runtime inputs or arguments on program call you may assume that both arguments are stored in variables (or another 'memory unit')
Restrictions
You may not use special functions like the gamma function or probability distribution functions as well as numerical integration functions or libraries (as long as you do not include their source in your code which counts for the score) etc. The idea is using only basic math functions. I tried to make a list of what I consider as 'basic' but there might be others I forgot - if so please comment.
Explicitly allowed are
+ - / *
% (modulo)
! (factorial for integers only)
&& || xor != e.t.c. (logical operators)
<< >> ^ e.t.c. (bitshift and bitwise operators)
** pow (powerfunction for floats like a^b including sqrt or and other roots)
sin cos tan arcsin arccos arctan arctan2
ln lb log(a,b)
rand() (generating pseudorandom numbers of UNIFORM distribution)
Scoring
Lowest number of bytes wins.
Entry
Please explain how your code works. Please show the output of your program for at least 5 different arguments. Among those should be the following three. For your own values please also provide 'exact' values (like from wolframalpha or what ever you prefer).
x = 0 v= (any) (should return 0.5000)
x = 3.078 v= 1 (should return 0.9000
x = 2.5 v= 23 (should return 0.9900)
If possible, it would be great if you could provide some fancy plots of the values calculated with your program (the code for plotting does not count for the score).
(from Wikipedia):
1Many languages natively support powerful numeric integration algorithms, which aren't "special functions like the gamma function or probability distribution functions etc.". These algorithms are implemented using only the basic mathematical operators you've listed. Is the use of such library functions allowed? – COTO – 2014-11-09T15:11:07.397
Thank you for pointing this out. No, I'd like the participants to program it 'from scratch', so they should implement the integration method in the program itself (it does not matter whether it is 'stolen' from a library, as long as it is included in the code and counts for the score.) – flawr – 2014-11-09T15:27:11.653
1So numerical integration is allowed provided it is included in the program? That shouldn't be too dificult, a simple riemann sum or trapezium rule can be implemented with a
for
loop, provided it starts at a low enough value. On the other hand, a polynomial approximation should be able to solve this to the required accuracy. Is that allowed? (What if I call it an "analytically derived" Taylor series, or even an "empirically derived" one?) – Level River St – 2014-11-09T17:31:27.960Yes this is allowed, the goal is having the required accuracy on the whole given domain. I am looking forward to seeing different approaches! If you want you could even do the whole think with one big lookup table, but it has to be included in the bytecount=) – flawr – 2014-11-09T20:53:52.217
May we generate random numbers (like floats between 0 and 1)? – xnor – 2014-11-10T09:32:17.197
Good question, I didn't think about that but I think I'm gonna allow that, I guess you are after MC-methods, right? (Or is there any other 'shortcut' that makes it way easier to solve the challenge with random numbers than without?) – flawr – 2014-11-10T17:03:44.103
Does it have to be 4 significant digits (or is it 4 decimal places)? For example according to http://www.wolframalpha.com/input/?i=t+distribution+probabilities for
– Level River St – 2014-11-10T18:33:24.470x=2.5788 v=1500
I get 0.9950, the same answer to 4 significant digits as forv=inf.
(table at http://mathworld.wolfram.com/Studentst-Distribution.html ). But forx=-2.5788 v=1450
I get 0.005048, which only matches the result forv=inf
in its last digit. Atx=-10 v=1450
I get 4.096E-23. The asymmetric number of decimal places required by current wording may require a datum at x=-10 rather than the obvious x=0 or x=-infThis question has turned out harder than I thought. First I tried integrating
– Level River St – 2014-11-10T18:47:45.950(1+x^2/v)^(-v/2-.5)
numerically in the ranges -9999...x and -9999...9999 and dividing one by the other (taking 9999 as an approximation for infinity and knowing that the gamma part would cancel out) but I ran into problems with machine limits. I have a way of calculating the gamma function in half-integer steps inductively. There's a complete formula at http://www.vosesoftware.com/ModelRiskHelp/index.htm not requiring integration, but it's pretty complicated.@steveverrill Oh I didn't consider that, so would you suggest altering the question to 4 decimal places (E.g. If the exact value is 0.0001624... the acceptable answer would be 0.0002)? – flawr – 2014-11-10T18:51:33.613
@flawr I've gone ahead and edited it, and fixed a spelling mistake in the title too. Roll it back if you disagree, but I think it's better that way. I dont think centering the calculations on x=-10 was your intention, and v=1450 is more than enough to make this challenging. – Level River St – 2014-11-10T19:06:08.240
@steveverrill The changes are ok, I just didn't understand what you mean by the second part? Why do you mean by centering the calculations on
x=-10
? Would you suggestv=1450
as limit? – flawr – 2014-11-10T19:12:52.227@flawr per my previous comment, around
v=1450
is where the curve becomes indistinguishable fromv=inf
based on a precision of 4 decimal places, so now there shouldn't be a need for a limit onv
. Gamma(1450) is a big number but just about manageable. It would be easier if it was lower :-) The regarding the x=-10, the point of my edit was to avoid the need for excessive precision at the low end of thex
domain, so that part is solved too. I was just reiterating why I think the change is a good one. Because I'm embarrased about changing the rules on a question I'm planning to answer. – Level River St – 2014-11-10T19:30:25.857Ah ok now I understand, well there are really some problems I didn't consider (e.g. the limits of the computers, I only thought about the algorithms in a mathematical way.) So no need to feel embarrassed, I am really glad you helped eliminating the 'impossible' aspects=) – flawr – 2014-11-10T19:36:13.420
Can I use a built-in value of
π
? – TwiNight – 2014-11-11T04:35:12.693@TwiNight Yes you can, I'll add that. – flawr – 2014-11-11T13:56:14.870
My MC convergence rate essentially becomes 0. Can't get 4 digits ahhhhhh! I'm guessing this is a problem with Box Muller. – Vlo – 2014-11-18T22:18:27.607
Well it seems this thread is not getting flooded with answers, so feel free to post your approach anyway! – flawr – 2014-11-19T13:49:00.123
You can use Box Muller (or many of the more algorithmically complex but better methods) to generate normal i.i.d. variables using uniform random. Then you can use (http://stats.stackexchange.com/a/70270) to generate student-t random variables (should be an exact result). Generate a lot of these student-t and rely on almost-non-existant-convergence to find the density. I'm doing this in
– Vlo – 2014-11-19T19:42:29.833R
and its loop performance is essentially 0. I figure an implementation in a faster language with couple billion samples could possibly get the solution fairly close (assuming normal r.v. gen is good...)