Implement an API for probability distributions

9

Introduction

In this challenge, your task is to implement a collection of simple functions that together form a usable mini-library for simple probability distributions. To accommodate some of the more esoteric languages people like to use here, the following implementations are acceptable:

  1. A code snippet defining a collection of named functions (or closest equivalents).
  2. A collection of expressions that evaluate to named or anonymous functions (or closest equivalents).
  3. A single expression that evaluates to several named or anonymous functions (or closest equivalents).
  4. A collection of independent programs that take inputs from command line, STDIN or closest equivalent, and output to STDOUT or closest equivalent.

The functions

You shall implement the following functions, using shorter names if desired.

  1. uniform takes as input two floating point numbers a and b, and returns the uniform distribution on [a,b]. You can assume that a < b; the case a ≥ b is undefined.
  2. blend takes as inputs three probability distributions P, Q and R. It returns a probability distribution S, which draws values x, y and z from P, Q and R, respectively, and yields y if x ≥ 0, and z if x < 0.
  3. over takes as input a floating point number f and a probability distribution P, and returns the probability that x ≥ f holds for a random number x drawn from P.

For reference, over can be defined as follows (in pseudocode):

over(f, uniform(a, b)):
    if f <= a: return 1.0
    else if f >= b: return 0.0
    else: return (b - f)/(b - a)

over(f, blend(P, Q, R)):
    p = over(0.0, P)
    return p*over(f, Q) + (1-p)*over(f, R)

You can assume that all probability distributions given to over are constructed using uniform and blend, and that the only thing a user is going to do with a probability distribution is to feed it to blend or over. You can use any convenient datatype to represent the distributions: lists of numbers, strings, custom objects, etc. The only important thing is that the API works correctly. Also, your implementation must be deterministic, in the sense of always returning the same output for the same inputs.

Test cases

Your output values should be correct to at least two digits after the decimal point on these test cases.

over(4.356, uniform(-4.873, 2.441)) -> 0.0
over(2.226, uniform(-1.922, 2.664)) -> 0.09550806803314438
over(-4.353, uniform(-7.929, -0.823)) -> 0.49676329862088375
over(-2.491, uniform(-0.340, 6.453)) -> 1.0
over(0.738, blend(uniform(-5.233, 3.384), uniform(2.767, 8.329), uniform(-2.769, 6.497))) -> 0.7701533851999125
over(-3.577, blend(uniform(-3.159, 0.070), blend(blend(uniform(-4.996, 4.851), uniform(-7.516, 1.455), uniform(-0.931, 7.292)), blend(uniform(-5.437, -0.738), uniform(-8.272, -2.316), uniform(-3.225, 1.201)), uniform(3.097, 6.792)), uniform(-8.215, 0.817))) -> 0.4976245638164541
over(3.243, blend(blend(uniform(-4.909, 2.003), uniform(-4.158, 4.622), blend(uniform(0.572, 5.874), uniform(-0.573, 4.716), blend(uniform(-5.279, 3.702), uniform(-6.564, 1.373), uniform(-6.585, 2.802)))), uniform(-3.148, 2.015), blend(uniform(-6.235, -5.629), uniform(-4.647, -1.056), uniform(-0.384, 2.050)))) -> 0.0
over(-3.020, blend(blend(uniform(-0.080, 6.148), blend(uniform(1.691, 6.439), uniform(-7.086, 2.158), uniform(3.423, 6.773)), uniform(-1.780, 2.381)), blend(uniform(-1.754, 1.943), uniform(-0.046, 6.327), blend(uniform(-6.667, 2.543), uniform(0.656, 7.903), blend(uniform(-8.673, 3.639), uniform(-7.606, 1.435), uniform(-5.138, -2.409)))), uniform(-8.008, -0.317))) -> 0.4487803553043079

Zgarb

Posted 2015-10-22T16:07:24.173

Reputation: 39 083

2Can we use built-in functions to make them? – Mutador – 2015-10-22T16:41:21.030

@AndréMuta I forgot that Mathematica probably has built-ins for all of this... but I'm going to allow them, as long as they follow the rules. – Zgarb – 2015-10-22T17:19:52.433

What is your suggestion on how to represent floating point data in BrainFuck? – flawr – 2015-10-22T18:44:22.327

@flawr For languages that don't have native floating point numbers, you can use any convenient encoding for floats between -10.0 and 10.0 (exclusive) that has at most 0.001 difference between consecutive values. The output should be accurate to within 0.01 difference for the test cases. – Zgarb – 2015-10-22T19:07:20.493

Answers

1

CJam, 58 bytes

{[\]}:U;
{[@]}:B;
{_,2={~1$-@@-\/0e>1e<}{6Yb@f*\.{O})[_1@-].*:+}?}:O;

These are postfix operators that work on the stack: 2.0 1.0 3.0 U O is over(2, uniform(1, 3)).

Score count

{[\]} is the function itself, :U; assigns it to the name U and pops it. Essentially this isn't part of the function, so by score counting rule 2, I'd only have to count {[\]}. B is defined similarly.

However, O is recursive, and if I don't specify a name, there's no way to recurse. So here, I'd be inclined to count the :O; part. Then my score is 5+5+48=58 bytes in total.

Explanation

U pops two arguments and makes a pair in reverse order: a b => [b a].

B pops three arguments and makes a triple in rotated order: a b c => [b c a].

O's structure is as follows:

{             }:O;   Define O as this function:
 _,2=        ?       If the argument list's length is 2:
     {~Γ}            Append the list to the stack and execute subprogram Γ.
         {~Δ}        Else, do the same, but execute subprogram Δ.

Subprogram Γ handles uniform distributions:

Executed ops      Explanation   Stack contents
============      ===========   ==============
                  Initial       f; b; a
1$                Copy b        f; b; a; b
  -               Difference    f; b; (a-b)
   @@             Rotate x2     (a-b); f, b
     -            Difference    (a-b); (f-b)
      \/          Flip divide   (f-b)/(a-b)
        0e>       Clamp low     max(0, (f-b)/(a-b))
           1e<    Clamp high    min(1, max(0, (f-b)/(a-b)))

Subprogram Δ handles blended distributions:

Executed ops              Explanation    Stack contents
============              ===========    ==============
                          Initial        f; [Q R P]
6Yb                       Push [1,1,0]   f; [Q R P]; [1 1 0]
   @                      Rotate         [Q R P]; [1 1 0]; f
    f*                    Multiply each  [Q R P]; [f f 0]
      \                   Swap           [f f 0]; [Q R P]
       .{O}               Pairwise O     [q r p]
           )              Uncons         [q r] p
            [_1@-]        [p, 1-p]       [q r] [p 1-p]
                  .*:+    Dot product    q*p+r*(1-p)

Lynn

Posted 2015-10-22T16:07:24.173

Reputation: 55 648

2

Ruby, 103

u=b=->*a{a}
o=->f,d{d[2]?(p=o[0,d[0]])*o[f,d[1]]+(1-p)*o[f,d[2]]:(f<a=d[0])?1:(f>b=d[1])?0:(b-f)/(b-a)}

Defines three lambdas, u, b, and o. u and b just create two-element and three-element arrays respectively. o assumes a two-element array is a uniform distribution and a three-element one is a blend of three distributions. In the latter case it calls itself recursively.

histocrat

Posted 2015-10-22T16:07:24.173

Reputation: 20 600

2

MATLAB, 73

Time for a little "functional programming" in MATLAB. These are 3 anonymous functions. Uniform and blend are called the same way as the examples, but for over the arguments should be swapped. I don't really need an over since the first two return functions, but as a formality feval is a function that can call a function.

%uniform
@(a,b)@(x)(x<b)*min(1,(b-x)/(b-a))
%blend
@(P,Q,R)@(x)P(0)*(Q(x)-R(x))+R(x)
%over
@feval

Now MATLAB's parsing and evaluation system is a little wonky to say the least. It doesn't allow you to directly call a function that was returned from a function. Instead, one must first save the result to a variable. The 4th example could be done as follows:

x=uniform(-5.233,3.384);y=uniform(2.767,8.329);z=uniform(-2.769,6.497);over(blend(x,y,z),0.738)

However, it is possible to get around this by using feval to call all the functions. If the following definitions are used, then the examples can be evaluated exactly as they are written.

uniform=@(a,b)@(x)(x<b)*min(1,(b-x)/(b-a))
blend=@(P,Q,R)@(x)feval(P,0)*(feval(Q,x)-feval(R,x))+feval(R,x)
over=@(x,f)feval(f,x)

feersum

Posted 2015-10-22T16:07:24.173

Reputation: 29 566

Functions making functions... how perverse!

– Luis Mendo – 2015-10-23T11:56:12.803

1

Mathematica, 129 116 bytes

u=UniformDistribution@{##}&;b=If[x<0,z,y]~TransformedDistribution~{x\uF3D2#,y\uF3D2#2,z\uF3D2#3}&;o=Probability[x>=#,x\uF3D2#2]&

u, b and o are uniform, blend, and over respectively.Wrapper over the standard functions. Replace the \uF3D2s with the 3-byte character. Just returns 0 and 1 for cases 1, 4, and 7.

LegionMammal978

Posted 2015-10-22T16:07:24.173

Reputation: 15 731

1

Python, 146 bytes

u=lambda*a:a
b=u
x=lambda f,a,b:[int(f<=a),(b-f)/(b-a)][a<f<b]
y=lambda f,p,q,r:o(0,p)*o(f,q)+(1-o(0,p))*o(f,r)
o=lambda f,p:[x,y][len(p)-2](f,*p)

Same strategy as histocrat's Ruby answer, but in Python. To do recursion without a Z-combinator (which would be costly), x and y are defined as helper functions that evaluate over for 2- and 3-length argument tuples (uniform and blend arguments, respectively).

Test cases on ideone

Mego

Posted 2015-10-22T16:07:24.173

Reputation: 32 998

0

Matlab, 104 bytes

I hope this is still valid, as this only works for distributions with support in [-10,10] which is the requirement for languages that have no floating point support. The support vector and the accuracy can be easily adjusted by just altering corresponding numbers. u,o,b is for uniform,blend,over. The pdf is just represented as a discrete vector. I think this approach can easily be transferred to other languages.

D=1e-4;X=-10:D:10;
u=@(a,b)(1/(b-a))*(a<X&X<b);
o=@(x,d)sum(d.*(X>x))*D;
b=@(p,q,r)o(0,p).*q+(1-o(0,p)).*r;

You can test them if you define those functions first and then just paste this code:

[o(4.356, u(-4.873, 2.441)) , 0.0;
o(2.226, u(-1.922, 2.664)) , 0.09550806803314438;
o(-4.353, u(-7.929, -0.823)) , 0.49676329862088375;
o(-2.491, u(-0.340, 6.453)) , 1.0;
o(0.738, b(u(-5.233, 3.384), u(2.767, 8.329), u(-2.769, 6.497))) , 0.7701533851999125;
o(-3.577, b(u(-3.159, 0.070), b(b(u(-4.996, 4.851), u(-7.516, 1.455), u(-0.931, 7.292)), b(u(-5.437, -0.738), u(-8.272, -2.316), u(-3.225, 1.201)), u(3.097, 6.792)), u(-8.215, 0.817))) , 0.4976245638164541;
o(3.243, b(b(u(-4.909, 2.003), u(-4.158, 4.622), b(u(0.572, 5.874), u(-0.573, 4.716), b(u(-5.279, 3.702), u(-6.564, 1.373), u(-6.585, 2.802)))), u(-3.148, 2.015), b(u(-6.235, -5.629), u(-4.647, -1.056), u(-0.384, 2.050)))) , 0.0;
o(-3.020, b(b(u(-0.080, 6.148), b(u(1.691, 6.439), u(-7.086, 2.158), u(3.423, 6.773)), u(-1.780, 2.381)), b(u(-1.754, 1.943), u(-0.046, 6.327), b(u(-6.667, 2.543), u(0.656, 7.903), b(u(-8.673, 3.639), u(-7.606, 1.435), u(-5.138, -2.409)))), u(-8.008, -0.317))) , 0.4487803553043079]

flawr

Posted 2015-10-22T16:07:24.173

Reputation: 40 560

Matlab has FP support, so I think that this would be invalid. – LegionMammal978 – 2015-10-22T22:23:25.403

I'm hesitant to allow this, since Matlab supports floating point numbers natively. If you can replace X and D with MIN_FLOAT and MAX_FLOAT (or whatever Matlab calls them), then this is a valid approach. – Zgarb – 2015-10-23T00:30:06.333

Yes, you could ouse realmax/realmin, you could even make a vector that goes throu all floating point number if you have enough memory. – flawr – 2015-10-23T15:39:14.050