Fake Statistics

27

2

If you're going to invent some fake news, you'll want to fabricate some data to back it up. You must already have some preconceived conclusions and you want some statistics to strengthen the argument of your faulty logic. This challenge should help you!

Given three input numbers:

  • N - number of data points
  • μ - mean of data points
  • σ - standard deviation of data points, where μ and σ are given by:

    enter image description here

Output an unordered list of numbers, i, which would generate the given N, μ, and σ.

I'm not going to be too picky about I/O formats, but I do expect some sort of decimals for μ, σ, and the output data points. As a minimum, at least 3 significant figures and magnitude of at least 1,000,000 should be supported. IEEE floats are just fine.

  • N will always be an integer, where 1 ≤ N ≤ 1,000
  • μ can be any real number
  • σ will always be ≥ 0
  • data points can be any real number
  • if N is 1, then σ will always be 0.

Note that most inputs will have many possible outputs. You only need to give one valid output. The output may be deterministic or non-deterministic.

Examples

Input (N, μ, σ) -> Possible Output [list]

2, 0.5, 1.5 -> [1, 2]
5, 3, 1.414 -> [1, 2, 3, 4, 5]
3, 5, 2.160 -> [2, 6, 7]
3, 5, 2.160 -> [8, 4, 3]
1, 0, 0 -> [0]

Digital Trauma

Posted 2017-04-24T16:53:56.137

Reputation: 64 644

6Should've added a truthy/falsy input for p-value so we would have to make either correlated or non-correlated data to either fake-backup or fake-disprove ahaha. – Magic Octopus Urn – 2017-04-24T16:56:25.420

1What does +ve and -ve mean? – CG. – 2017-04-24T17:13:01.883

@ChelseaG. Abbreviations for positive and negative. I've edited to clarify.

– Digital Trauma – 2017-04-24T17:54:45.783

When N=1, will σ always be 0 to make it possible? – xnor – 2017-04-24T20:19:01.710

@xnor Yes, thats right – Digital Trauma – 2017-04-24T20:30:17.033

Can the output be deterministic? That is, same input always produces the same output – Luis Mendo – 2017-04-24T22:24:24.477

@LuisMendo Yes, that's fine. – Digital Trauma – 2017-04-24T23:14:01.897

1Really we pedants should be using the corrected sample standard deviation and not implementing for inputs with N=1. – Jonathan Allan – 2017-04-25T05:00:37.787

@JonathanAllan we never said we were good at fake statistics – Giuseppe – 2017-04-25T15:05:53.017

Answers

8

Pyth, 44 35 34 bytes

?eA.DhQ2+eQ*G,-eQJ*E@hc1thQ2+eQJ*G,-eQKE+eQK
.N?eA.DN2+T*G+LT_B*Y@hc1tN2*G+LT_BY
.N?eA.DN2+T*G+LT_B*Y@cNtN2*G+LT_BY

Try it online! (The code above defines a function. :.* is appended on the link to invoke the function.)

The maths

This constructs the data symmetrically. If N is even, then the data are just the mean plus or minus the standard deviation. However, if N is odd, then we just opened a can of worms, since the mean has to be present for the data to be symmetric, and so the fluctuations have to be multiplied by a certain factor.

If n is even

  • Half of the data are μ+σ.
  • Half of the data are μ-σ.

If n is odd

  • One datum is μ.
  • Less than half of the data are μ+σ*sqrt(n/(n-1)).
  • Less than half of the data are μ-σ*sqrt(n/(n-1)).

Leaky Nun

Posted 2017-04-24T16:53:56.137

Reputation: 45 011

6

MATL, 22 bytes

Thanks to @DigitalTrauma for a correction.

:t&1Zs/tYm-*+tZN?3G9L(

Input order is: N, σ, μ.

Try it online!

Or see a modified version that also computes the mean and standard deviation of the produced data, as a check.

Explanation

The code is divided into four parts:

  1. : generates the array [1 2 ... N] where N is taken as implicit input.

  2. t&1Zs/ divides those numbers by their empirical standard deviation (computed normalizing by N), and tYm- subtracts the empirical mean of the resulting values. This ensures that the results have empirical mean 0 and empirical standard deviation 1.

  3. * multiplies by σ and + adds μ, both taken as implicit inputs.

  4. tZN?x3G handles the special case that N = 1, σ = 0, for which the output should be μ. If this is indeed the case, then the empirical standard deviation computed in the second step was 0, the division gave inf, and multiplying by σ in the third step gave NaN. So what the code does is: if the obtained array consists of all NaN values (code tZN?), delete it (x) and push the third input (3G), which is μ.

Luis Mendo

Posted 2017-04-24T16:53:56.137

Reputation: 87 464

4

Python, 50 bytes

lambda n,m,s:[m+s*(n-1)**.5]+[m-s/(n-1%n)**.5]*~-n

Try it online!

Uses the following n-element distribution with mean 0 and sdev 1:

  • With probability 1/n (i.e. 1 element), output (n-1)**0.5
  • With probability 1-1/n (i.e. n-1 elements), output -(n-1)**(-0.5)

This is rescaled to mean m and sdev s by transforming x->m+s*x. Annoyingly, n=1 gives a division by zero error for a useless value, so we hack it away by doing /(n-1%n)**.5, with 1%n giving 0 for n==1 and 1 otherwise.

You might think (n-1)**.5 can be shortened to ~-n**.5, but the exponentiation happens first.

A def is one byte longer.

def f(n,m,s):a=(n-1%n)**.5;print[m+s*a]+[m-s/a]*~-n

xnor

Posted 2017-04-24T16:53:56.137

Reputation: 115 687

3

R, 83 62 53 bytes

function(n,m,s)`if`(n>1,scale(1:n)*s*sqrt(1-1/n)+m,m)

If n=1, then it returns m (since scale would return NA), otherwise it scales the data [1,...,n] to have mean 0 and (sample) standard deviation 1, so it multiplies by s*sqrt(1-1/n) to get the correct population standard deviation, and adds m to shift to the appropriate mean. Thanks to Dason for introducing me to the scale function and dropping those bytes!

Try it online!

Giuseppe

Posted 2017-04-24T16:53:56.137

Reputation: 21 077

Can you add some tests in Try It Online so your answer may easily be verified?

– Digital Trauma – 2017-04-24T20:46:40.427

yep! give me two minutes. – Giuseppe – 2017-04-24T20:53:52.447

Could probably just use 1:n instead of rt(n,n) to save 4 bytes. And the scale function could probably be useful. – Dason – 2017-04-25T12:48:00.107

@Dason -- thanks! I learned about scale which is great. – Giuseppe – 2017-04-25T15:05:32.553

1

Jelly, 20 bytes

÷_Ḃ$©$*.;N$ṁ®;0ṁ⁸×⁵+

Try it online!

Full program taking three command line arguments: n, μ, σ.

How?

Creates floor(n / 2) values equidistant from the mean and a value at the mean if n is odd such that the standard deviation is correct...

÷_Ḃ$©$*.;N$ṁ®;0ṁ⁸×⁵+ - Main link: n, μ (σ expected as third input, the 5th command argument)
   $                 - last two links as a monad:
 _                   -   n minus:
  Ḃ                  -     n mod 2            i.e. n-1 if n is odd, n if n is even
    ©                - copy value to register
÷                    - n divided by that
       .             - literal 0.5
      *              - exponentiate = (n / (n - (n mod 2))) ^ 0.5
                     -        i.e. 1 if n is even; or (n/(n-1))^0.5 if n is odd
         $           - last two links as a monad:
        N            -   negate
       ;             -   concatenate   i.e. [1,-1] or [(n/(n-1))^0.5,-(n/(n-1))^0.5]
            ®        - recall value from register
           ṁ         - mould the list like something of that length
             ;0      - concatenate a zero
                ⁸    - link's left argument, n
               ṁ     - mould the list like something of length n (removes the zero for even n)
                  ⁵  - fifth command argument, third program argument (σ)
                 ×   - multiply (vectorises)
                   + - add μ (vectorises)

Jonathan Allan

Posted 2017-04-24T16:53:56.137

Reputation: 67 804