Take It or Leave It: A Game Show for Computers

28

5

Context:

A reclusive billionaire has created a game show to attract the world's best and brightest programmers. On Mondays at the stroke of midnight, he chooses one person from a pool of applicants to be the contestant of the week, and provides them with a game. You are this week's lucky contestant!

This week's game:

The host provides you with API access to a stack of 10,000 digital envelopes. These envelopes are randomly sorted, and contain within them a dollar value, between $1 and $10,000 (no two envelopes contain the same dollar value).

You have 3 commands at your disposal:

  1. Read(): Read the dollar figure in the envelope at the top of the stack.

  2. Take(): Add the dollar figure in the envelope to your game show wallet, and pop the envelope off the stack.

  3. Pass(): Pop off the envelope on the top of the stack.

The Rules:

  1. If you use Pass() on an envelope, the money within is lost forever.

  2. If you use Take() on an envelope containing $X, from that point forward, you may never use Take() on an envelope containing < $X. Take() on one of these envelopes will add $0 to your wallet.

Write an algorithm that finishes the game with the maximal amount of money.

If you're writing a solution in Python, feel free to use this controller to test out algorithms, courtesy of @Maltysen: https://gist.github.com/Maltysen/5a4a33691cd603e9aeca

If you use the controller, you cannot access globals, you can only use the 3 provided API commands, and local scoped variables. (@Beta Decay)

Notes: "Maximal" in this case means the median value in your wallet after N > 50 runs. I expect, though I would love to be proven wrong, that the median value for a given algorithm will converge as N increases to infinity. Feel free to try to maximize the mean instead, but I have a feeling that the mean is more likely to be thrown off by a small N than the median is.

Edit: changed the number of envelopes to 10k for easier processing, and made Take() more explicit.

Edit 2: The prize condition has been removed, in light of this post on meta.

Current high scores:

PhiNotPi - $805,479

Reto Koradi - $803,960

Dennis - $770,272 (Revised)

Alex L. - $714,962 (Revised)

LivingInformation

Posted 2015-07-31T21:25:37.457

Reputation: 761

I implemented in a way that it just returns False. Since you can read it there is no real point of failing the entire game on a failed take() – OganM – 2015-07-31T21:43:27.397

though I find 100000 kinda high since it takes a long time to do a single iteration – OganM – 2015-07-31T21:47:31.677

4

In case anyone wants to use it, here is the controller that i've been using to test my algorithms: https://gist.github.com/Maltysen/5a4a33691cd603e9aeca

– Maltysen – 2015-07-31T22:36:32.997

8P.S. Nice question and welcome to Programming Puzzles and Code Golf :) – trichoplax – 2015-07-31T23:17:33.247

3@Maltysen I put your controller into the OP, thanks for the contribution! – LivingInformation – 2015-08-01T04:20:39.613

1

I couldn't find an explicit rule on bitcoin prizes, but there is some meta discussion on real world prizes which people can contribute to.

– trichoplax – 2015-08-01T14:22:24.633

Answers

9

CJam, $87,143 $700,424 $720,327 $727,580 $770,272

{0:T:M;1e4:E,:)mr{RM>{RR(*MM)*-E0.032*220+R*<{ERM--:E;R:MT+:T;}{E(:E;}?}&}fRT}
[easi*]$easi2/=N

This program simulates the entire game multiple times and calculates the median.

How to run

I've scored my submission by doing 100,001 test runs:

$ time java -jar cjam-0.6.5.jar take-it-or-leave-it.cjam 100001
770272

real    5m7.721s
user    5m15.334s
sys     0m0.570s

Approach

For each envelope, we do the following:

  • Estimate the amount of money that will inevitably be lost by taking the envelope.

    If R is the content and M is the maximum that has been taken, the amount can be estimated as R(R-1)/2 - M(M+1)/2, which gives the money all envelopes with contents X in the interval (M,R) contain.

    If no envelopes had been passed yet, the estimation would be perfect.

  • Calculate the amount of money that will inevitably be lost by passing the envelope.

    This is simply the money the envelope contains.

  • Check if the quotient of both is less than 110 + 0.016E, where E is the number of remaining envelopes (not counting envelopes that cannot be taken anymore).

    If so, take. Otherwise, pass.

Dennis

Posted 2015-07-31T21:25:37.457

Reputation: 196 637

5Because using a golfing language helps in any way whatsoever. ;P +1 for the algo. – Maltysen – 2015-07-31T23:17:17.580

Interesting solution, thanks Dennis! This is a game show, and game shows tend to have prizes...Do you have a bitcoin address, by any chance? – LivingInformation – 2015-07-31T23:59:56.493

@LivingInformation I do, but I'm not sure if accepting real money for a code golf answer violates the TOS. – Dennis – 2015-08-01T00:47:42.270

@Dennis I searched around before adding the prize conditions, and found nothing explicit about giving prizes. Some posts indicated that it could be a bad idea unless the prize terms were very specific, but there doesn't seem to be any rule against it. – LivingInformation – 2015-08-01T00:52:34.497

2

I can't replicate your results using a Python clone: https://gist.github.com/orlp/f9b949d60c766430fe9c. You score around $50,000. That's an order of magnitude off.

– orlp – 2015-08-01T01:50:07.180

@orlp I had pretty much everything backwards. Turns out that my algorithm is pretty terrible... – Dennis – 2015-08-01T02:51:04.127

@orlp thanks for that, the score in the main post has been updated. – LivingInformation – 2015-08-01T02:57:31.147

@Dennis where did the 139 come from? I love the idea of calculating the amount gained versus the amount potentially lost by passing that envelope, and think statistical analysis of both of those numbers could go pretty far in terms of calculating the perfectly optimal choice at each point, but 139 feels like a stab in the dark from there. – LivingInformation – 2015-08-01T05:07:45.067

1@LivingInformation Trial and error. I'm currently looking at using the exact amount instead of estimations, but the resulting code is very slow. – Dennis – 2015-08-01T05:11:45.013

2This answer needs more upvotes than mine! It's more clever, scores higher, and is even golfed! – Alex L – 2015-08-01T18:20:42.397

1@LivingInformation This is my address: 17uLHRfdD5JZ2QjSqPGQ1B12LoX4CgLGuV – Dennis – 2015-08-01T21:40:23.050

Two confirmations and counting. Thanks, @LivingInformation! – Dennis – 2015-08-01T22:17:38.087

7

Python, $680,646 $714,962

f = (float(len(stack)) / 10000)
step = 160
if f<0.5: step = 125
if f>0.9: step = 190
if read() < max_taken + step:
    take()
else:
    passe()

Takes larger and larger amounts in steps of size between $125 and $190. Ran with N=10,000 and got a median of $714962. These step sizes came from trial and error and are certainly not optimal.

The full code, including a modified version of @Maltysen's controller which prints a bar chart while it runs:

import random
N = 10000


def init_game():
    global stack, wallet, max_taken
    stack = list(range(1, 10001))
    random.shuffle(stack)
    wallet = max_taken = 0

def read():
    return stack[0]

def take():
    global wallet, max_taken
    amount = stack.pop(0)
    if amount > max_taken:
        wallet += amount
        max_taken = amount

def passe():
    stack.pop(0)

def test(algo):
    results = []
    for _ in range(N):
        init_game()
        for i in range(10000):
            algo()
        results += [wallet]
        output(wallet)
    import numpy
    print 'max: '
    output(max(results))
    print 'median: '
    output(numpy.median(results))
    print 'min: '
    output(min(results))

def output(n):
    print n
    result = ''
    for _ in range(int(n/20000)):
        result += '-'
    print result+'|'

def alg():
    f = (float(len(stack)) / 10000)
    step = 160
    if f<0.5: step = 125
    if f>0.9: step = 190
    if read() < max_taken + step:
        #if read()>max_taken: print read(), step, f
        take()
    else:
        passe()

test(alg)

BitCoin address: 1CBzYPCFFBW1FX9sBTmNYUJyMxMcmL4BZ7

Wow OP delivered! Thanks @LivingInformation!

Alex L

Posted 2015-07-31T21:25:37.457

Reputation: 761

1The controller is Maltysen's, not mine. – orlp – 2015-08-01T04:10:32.167

2Confirmed. I had just set up a controller, and get very similar numbers for your solution. Strictly speaking, I think you have to maintain the value of max_taken in your own code, since it's not part of the official game API. But that's trivial to do. – Reto Koradi – 2015-08-01T04:32:47.227

1Yeah, max_taken is in @Maltysen's controller. If it is useful I can post the entire solution (controller + algorithm) in one block. – Alex L – 2015-08-01T04:36:39.453

It's really no big deal. But I think the cleanest approach would be to only use the read(), take() and pass() methods in the posted code, since those are the "3 commands at your disposal" based on the definition in the question. – Reto Koradi – 2015-08-01T04:41:02.507

@Reto I'm willing to revise the question to whatever commands make the most sense. Read, Take, and Pass were all 4 characters, and felt fitting, but i'm open to suggestions (for example, i've considered changing "pass" to "leave", because I titled the post "take it or leave it"). – LivingInformation – 2015-08-01T04:43:43.713

Just noticed I was calculating mean instead of median! Will post new score soon. Ok, updated with median score. – Alex L – 2015-08-01T06:31:28.803

step = 160 if f<0.2: step = 75 if f<0.5: step = 125 if f>0.9: step = 190

f <0.2: step = 75 always gets overwritten by f<.5: step = 125. Bug? @Alex L. – LivingInformation – 2015-08-01T06:54:17.000

@LivingInformation yup! removed the useless line. – Alex L – 2015-08-01T07:00:29.957

How long does this take for you to run 10001 iterations? – Beta Decay – 2015-08-02T07:05:41.773

5

C++, $803,960

for (int iVal = 0; iVal < 10000; ++iVal)
{
    int val = game.read();
    if (val > maxVal &&
        val < 466.7f + 0.9352f * maxVal + 0.0275f * iVal)
    {
        maxVal = val;
        game.take();
    }
    else
    {
        game.pass();
    }
}

Reported result is the median from 10,001 games.

Reto Koradi

Posted 2015-07-31T21:25:37.457

Reputation: 4 870

Guess and check, I take it? Or did you use some kind of input fuzzer for the constants? – LivingInformation – 2015-08-01T23:44:31.637

I ran an optimization algorithm to determine the constants. – Reto Koradi – 2015-08-02T00:04:02.947

Do you think that a dynamic calculation at each point would be more effective, or do you think this is approaching the maximum value you can receive? – LivingInformation – 2015-08-02T00:32:53.660

I have no reason to believe that it's the ideal strategy. I hope it's the maximum for a linear function with these parameters. I've been trying to allow various kinds of non-linear terms, but so far haven't found anything significantly better. – Reto Koradi – 2015-08-02T01:14:05.660

At turn N, given two choices (take or pass), you simulate M "Perfect games" using the remaining unchosen dollar values. That is, randomly shuffle all the remaining elements M times, and each time calculate the maximum amount you could gain if you played perfectly, for the case of taking the money and for leaving it. Whichever choice resulted in the highest average "perfect game" is the choice you make in that turn.

It wouldn't be CPU optimal, of course... O(Envelopes * M * O(perfect game calculation)) – LivingInformation – 2015-08-02T01:37:31.797

1I can confirm that simulating this gives the reported score of slightly more than $800,000. – orlp – 2015-08-02T16:48:32.020

3

Java, $806,899

This is from a trial of 2501 rounds. I am still working on optimizing it. I wrote two classes, a wrapper and a player. The wrapper instantiates the player with the number of envelopes (always 10000 for the real thing), and then calls the method takeQ with the top envelope's value. The player then returns true if they take it, false if they pass it.

Player

import java.lang.Math;

public class Player {
  public int[] V;

  public Player(int s) {
    V = new int[s];
    for (int i = 0; i < V.length; i++) {
      V[i] = i + 1;
    }
    // System.out.println();
  }

  public boolean takeQ(int x) {

    // System.out.println("look " + x);

    // http://www.programmingsimplified.com/java/source-code/java-program-for-binary-search
    int first = 0;
    int last = V.length - 1;
    int middle = (first + last) / 2;
    int search = x;

    while (first <= last) {
      if (V[middle] < search)
        first = middle + 1;
      else if (V[middle] == search)
        break;
      else
        last = middle - 1;

      middle = (first + last) / 2;
    }

    int i = middle;

    if (first > last) {
      // System.out.println(" PASS");
      return false; // value not found, so the envelope must not be in the list
                    // of acceptable ones
    }

    int[] newVp = new int[V.length - 1];
    for (int j = 0; j < i; j++) {
      newVp[j] = V[j];
    }
    for (int j = i + 1; j < V.length; j++) {
      newVp[j - 1] = V[j];
    }
    double pass = calcVal(newVp);
    int[] newVt = new int[V.length - i - 1];
    for (int j = i + 1; j < V.length; j++) {
      newVt[j - i - 1] = V[j];
    }
    double take = V[i] + calcVal(newVt);
    // System.out.println(" take " + take);
    // System.out.println(" pass " + pass);

    if (take > pass) {
      V = newVt;
      // System.out.println(" TAKE");
      return true;
    } else {
      V = newVp;
      // System.out.println(" PASS");
      return false;
    }
  }

  public double calcVal(int[] list) {
    double total = 0;
    for (int i : list) {
      total += i;
    }
    double ent = 0;
    for (int i : list) {
      if (i > 0) {
        ent -= i / total * Math.log(i / total);
      }
    }
    // System.out.println(" total " + total);
    // System.out.println(" entro " + Math.exp(ent));
    // System.out.println(" count " + list.length);
    return total * (Math.pow(Math.exp(ent), -0.5) * 4.0 / 3);
  }
}

Wrapper

import java.lang.Math;
import java.util.Random;
import java.util.ArrayList;
import java.util.Collections;

public class Controller {
  public static void main(String[] args) {
    int size = 10000;
    int rounds = 2501;
    ArrayList<Integer> results = new ArrayList<Integer>();
    int[] envelopes = new int[size];
    for (int i = 0; i < envelopes.length; i++) {
      envelopes[i] = i + 1;
    }
    for (int round = 0; round < rounds; round++) {
      shuffleArray(envelopes);

      Player p = new Player(size);
      int cutoff = 0;
      int winnings = 0;
      for (int i = 0; i < envelopes.length; i++) {
        boolean take = p.takeQ(envelopes[i]);
        if (take && envelopes[i] >= cutoff) {
          winnings += envelopes[i];
          cutoff = envelopes[i];
        }
      }
      results.add(winnings);
    }
    Collections.sort(results);
    System.out.println(
        rounds + " rounds, median is " + results.get(results.size() / 2));
  }

  // stol... I mean borrowed from
  // http://stackoverflow.com/questions/1519736/random-shuffling-of-an-array
  static Random rnd = new Random();

  static void shuffleArray(int[] ar) {
    for (int i = ar.length - 1; i > 0; i--) {
      int index = rnd.nextInt(i + 1);
      // Simple swap
      int a = ar[index];
      ar[index] = ar[i];
      ar[i] = a;
    }
  }
}

A more detailed explanation is coming soon, after I finish optimizations.

The core idea is to be able to estimate the reward from playing a game from a given set of envelopes. If the current set of envelopes is {2,4,5,7,8,9}, and the top envelope is the 5, then there are two possibilities:

  • Take the 5 and play a game with {7,8,9}
  • Pass the 5 and play a game of {2,4,7,8,9}

If we calculate the expected reward of {7,8,9} and compare it to the expected reward of {2,4,7,8,9}, we will be able to tell if taking the 5 is worth it.

Now the question is, given a set of envelopes like {2,4,7,8,9} what is the expected value? I found the the expected value seems to be proportional to the total amount of money in the set, but inversely proportional to the square root of the number of envelopes that the money is divided into. This came from "perfectly" playing several small games in which all of the envelopes have almost identical value.

The next problem is how to determine the "effective number of envelopes." In all cases, the number of envelopes is known exactly by keeping track of what you've seen and done. Something like {234,235,236} is definitely three envelopes, {231,232,233,234,235} is definitely 5, but {1,2,234,235,236} should really count as 3 and not 5 envelopes because the 1 and 2 are nearly worthless, and you would never PASS on a 234 so you could later pick up a 1 or 2. I had the idea of using Shannon entropy to determine the effective number of envelopes.

I targeted my calculations to situations where the envelope's values are uniformly distributed over some interval, which is what happens during the game. If I take {2,4,7,8,9} and treat that as a probability distribution, its entropy is 1.50242. Then I do exp() to get 4.49254 as the effective number of envelopes.

The estimated reward from {2,4,7,8,9} is 30 * 4.4925^-0.5 * 4/3 = 18.87

The exact number is 18.1167.

This isn't an exact estimate, but I am actually really proud of how well this fits the data when the envelopes are uniformly distributed over an interval. I'm not sure of the correct multiplier (I am using 4/3 for now) but here is a data table excluding the multiplier.

Set of Envelopes                    Total * (e^entropy)^-0.5      Actual Score

{1,2,3,4,5,6,7,8,9,10}              18.759                        25.473
{2,3,4,5,6,7,8,9,10,11}             21.657                        29.279
{3,4,5,6,7,8,9,10,11,12}            24.648                        33.125
{4,5,6,7,8,9,10,11,12,13}           27.687                        37.002
{5,6,7,8,9,10,11,12,13,14}          30.757                        40.945
{6,7,8,9,10,11,12,13,14,15}         33.846                        44.900
{7,8,9,10,11,12,13,14,15,16}        36.949                        48.871
{8,9,10,11,12,13,14,15,16,17}       40.062                        52.857
{9,10,11,12,13,14,15,16,17,18}      43.183                        56.848
{10,11,12,13,14,15,16,17,18,19}     46.311                        60.857

Linear regression between expected and actual gives an R^2 value of 0.999994.

My next step to improve this answer is to improve estimation when the number of envelopes starts to get small, which is when the envelopes are not approximately uniformly distributed and when the problem starts to get granular.


Edit: If this is deemed worthy of bitcoins, I just got an address at 1PZ65cXxUEEcGwd7E8i7g6qmvLDGqZ5JWg. Thanks! (This was here from when the challenge author was handing out prizes.)

PhiNotPi

Posted 2015-07-31T21:25:37.457

Reputation: 26 739

Accidentally sent you 20k satoshi over 805,479. For reference, the amount was supposed to be your score. Enjoy my mistake :) – LivingInformation – 2015-08-03T02:24:55.600

Will you be running numbers with more rounds? Based on what I'm seeing, there is quite a bit of variation, and 500 is not enough to get a stable median. My score is very close to yours if I run only 500 rounds, but it all depends on how the random numbers happen to fall. If I used a variable seed, and did 500 runs a few times, I could probably get a higher score. – Reto Koradi – 2015-08-03T02:52:23.853

@RetoKoradi I'm definitely going to do more rounds. – PhiNotPi – 2015-08-03T13:19:55.357

3

C++, ~$815,000

Based on Reto Koradi's solution, but switches to a more sophisticated algorithm once there's 100 (valid) envelopes left, shuffling random permutations and computing the heaviest increasing subsequence of them. It will compare the results of taking and not taking the envelope, and will greedily select the best choice.

#include <algorithm>
#include <iostream>
#include <vector>
#include <set>


void setmax(std::vector<int>& h, int i, int v) {
    while (i < h.size()) { h[i] = std::max(v, h[i]); i |= i + 1; }
}

int getmax(std::vector<int>& h, int n) {
    int m = 0;
    while (n > 0) { m = std::max(m, h[n-1]); n &= n - 1; }
    return m;
}

int his(const std::vector<int>& l, const std::vector<int>& rank) {
    std::vector<int> h(l.size());
    for (int i = 0; i < l.size(); ++i) {
        int r = rank[i];
        setmax(h, r, l[i] + getmax(h, r));
    }

    return getmax(h, l.size());
}

template<class RNG>
void shuffle(std::vector<int>& l, std::vector<int>& rank, RNG& rng) {
    for (int i = l.size() - 1; i > 0; --i) {
        int j = std::uniform_int_distribution<int>(0, i)(rng);
        std::swap(l[i], l[j]);
        std::swap(rank[i], rank[j]);
    }
}

std::random_device rnd;
std::mt19937_64 rng(rnd());

struct Algo {
    Algo(int N) {
        for (int i = 1; i < N + 1; ++i) left.insert(i);
        ival = maxval = 0;
    }

    static double get_p(int n) { return 1.2 / std::sqrt(8 + n) + 0.71; }

    bool should_take(int val) {
        ival++;
        auto it = left.find(val);
        if (it == left.end()) return false;

        if (left.size() > 100) {
            if (val > maxval && val < 466.7f + 0.9352f * maxval + 0.0275f * (ival - 1)) {
                maxval = val;
                left.erase(left.begin(), std::next(it));
                return true;
            }

            left.erase(it);
            return false;
        }

        take.assign(std::next(it), left.end());
        no_take.assign(left.begin(), it);
        no_take.insert(no_take.end(), std::next(it), left.end());
        take_rank.resize(take.size());
        no_take_rank.resize(no_take.size());
        for (int i = 0; i < take.size(); ++i) take_rank[i] = i;
        for (int i = 0; i < no_take.size(); ++i) no_take_rank[i] = i;

        double take_score, no_take_score;
        take_score = no_take_score = 0;
        for (int i = 0; i < 1000; ++i) {
            shuffle(take, take_rank, rng);
            shuffle(no_take, no_take_rank, rng);
            take_score += val + his(take, take_rank) * get_p(take.size());
            no_take_score += his(no_take, no_take_rank) * get_p(no_take.size());
        }

        if (take_score > no_take_score) {
            left.erase(left.begin(), std::next(it));
            return true;
        }

        left.erase(it);
        return false;
    }

    std::set<int> left;
    int ival, maxval;
    std::vector<int> take, no_take, take_rank, no_take_rank;
};


struct Game {
    Game(int N) : score_(0), max_taken(0) {
        for (int i = 1; i < N + 1; ++i) envelopes.push_back(i);
        std::shuffle(envelopes.begin(), envelopes.end(), rng);
    }

    int read() { return envelopes.back(); }
    bool done() { return envelopes.empty(); }
    int score() { return score_; }
    void pass() { envelopes.pop_back(); }

    void take() {
        if (read() > max_taken) {
            score_ += read();
            max_taken = read();
        }
        envelopes.pop_back();
    }

    int score_;
    int max_taken;
    std::vector<int> envelopes;
};


int main(int argc, char** argv) {
    std::vector<int> results;
    std::vector<int> max_results;
    int N = 10000;
    for (int i = 0; i < 1000; ++i) {
        std::cout << "Simulating game " << (i+1) << ".\n";
        Game game(N);
        Algo algo(N);

        while (!game.done()) {
            if (algo.should_take(game.read())) game.take();
            else game.pass();
        }
        results.push_back(game.score());
    }

    std::sort(results.begin(), results.end());
    std::cout << results[results.size()/2] << "\n";

    return 0;
}

orlp

Posted 2015-07-31T21:25:37.457

Reputation: 37 067

Interesting. It had crossed my mind that it should be possible to improve by looking at the values left for the last few envelopes. I figure you played with the cutoff point where you switch strategies? Is it just getting too slow if you switch earlier? Or are the results actually getting worse? – Reto Koradi – 2015-08-03T13:54:18.983

@RetoKoradi I did play with the cutoff point, and earlier cutoffs both got too slow and worse. Not too surprising honestly, at 100 envelopes we're already sampling a mere 1000 permutations out of a possible 93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000. – orlp – 2015-08-03T14:15:54.420