Take It or Leave It II: A Game Show for Computers

20

5

This is the second in a series of puzzles that I will be posting every Monday at Midnight PST. The first puzzle is located Here.

Context:

A reclusive billionaire has created a game show to attract the world's best and brightest programmers. On Mondays at the stroke of midnight, he chooses one person from a pool of applicants to be the contestant of the week, and provides them with a game. You are this week's lucky contestant!

This week's game:

The host provides you with API access to a stack of 10,000 digital envelopes. These envelopes are randomly sorted, and contain within them a dollar value, between $1 and $10,000 (no two envelopes contain the same dollar value).

You have 4 commands at your disposal:

  1. Read(): Read the dollar figure in the envelope at the top of the stack.

  2. Take(): Add the dollar figure in the envelope to your game show wallet, and pop the envelope off the stack.

  3. Pass(): Pop off the envelope on the top of the stack.

  4. Oracle(M): Returns the mean value of the next M envelopes in the stack, not including the one you can currently Read().

The Rules:

  1. If you use Pass() on an envelope, the money within is lost forever.

  2. If you use Take() on an envelope containing $X, from that point forward, you may never use Take() on an envelope containing < $X. Take() on one of these envelopes will add $0 to your wallet.

  3. If you use Oracle(M) on turn T, envelopes T+1 through T+M's mean will be returned. Oracle() is disabled until turn T+M.

Write an algorithm that finishes the game with the maximal amount of money.

If you're writing your algorithm in Python, feel free to use this controller provided by @Maltysen: https://gist.github.com/livinginformation/70ae3f2a57ecba4387b5

Notes 1: "Maximal" in this case means the median value in your wallet after N >= 1000 runs. I expect, though I would love to be proven wrong, that the median value for a given algorithm will converge as N increases to infinity. Feel free to try to maximize the mean instead, but I have a feeling that the mean is more likely to be thrown off by a small N than the median is.

Note 2: as all solutions to the previous part of this puzzle are valid here, reposting them has little value. Only algorithmic improvements of previous puzzles will be considered for part II.

Edit: The prize condition has been removed, in light of this post on meta.

LivingInformation

Posted 2015-08-03T07:00:04.587

Reputation: 761

Wow, I can't believe I overslept :O – Beta Decay – 2015-08-03T07:18:13.417

@Beta Decay clock is ticking! :) – LivingInformation – 2015-08-03T07:19:12.467

What's the sense of the rracle? You could build your own free oracle by just keeping a tally of all previously read envelopes. What am I getting wrong? – Luis Mendo – 2015-08-03T11:26:00.570

1@LuisMendo With your own tally, you can only know the mean of all remaining values. With the oracle, you can get the mean of the next M values, where you get to choose M. – Reto Koradi – 2015-08-03T13:34:52.403

1Since all solutions to your previous challenge are also valid solutions to this challenge, can we consider them implicitly submitted? – Reto Koradi – 2015-08-03T13:36:14.873

@Reto updated the OP with a second note to that effect. – LivingInformation – 2015-08-03T14:40:50.467

@RetoKoradi Got it. Thanks – Luis Mendo – 2015-08-03T17:20:46.370

Answers

9

Groovy $713337 $817829 $818227

Bootstrap code:

class Instance {
    List values = new ArrayList(1..10000); {
        Collections.shuffle(values)
    }
    int i = 0
    int value = 0
    int max = 0
    int nextOracle = 0

    def pass() {
        if (i >= 10000)
            throw new NoSuchElementException()
        i++
    }

    def take() {
        if (i >= 10000)
            throw new NoSuchElementException()
        int v = values[i]
        if (v > max) {
            max = v
            value += v
        }
        i++
    }

    double oracle(int m) {
        if (m <= 0 || i < nextOracle || i + m >= 10000)
            throw new NoSuchElementException()

        nextOracle = i + m
        values.subList(i + 1, i + m + 1).stream().reduce { l, r -> r+l }.get() / m
    }

    int read() {
        if (i >= 10000)
            throw new NoSuchElementException()
        values[i]
    }
}

Algorithm

double square(double v) { v * v }
final double factor = Math.pow(1.5, 1.1)
int attempts = 5000
(1..attempts).stream().parallel().mapToLong {
    def puzzle = new Instance()

    int[] memory = 1..10000 // We will remember every envelope
    int memStart = 0

    while (memStart < 10000 - 3) {
        int value = puzzle.read()
        int i = Arrays.binarySearch(memory, memStart, 10000, value) - memStart
        if (i < 0) { // We can't use the money
            puzzle.pass()
            continue
        }
        if (i == 0) { // Of course we take the lowest
            puzzle.take()
            memStart++
            continue
        }
        int remaining = Arrays.stream(memory, i + 1 + memStart, 10000).sum() // Money we could win if taken
        int losing = Arrays.stream(memory, memStart, memStart + i).sum() // Money we cna't win if taken
        if (value > losing) { // If we pass, we lose money automatically
            puzzle.take()
            memStart += i + 1
        } else if ((losing - value * 16 / 7) * square(Math.log(i)) > remaining / factor) {
            System.arraycopy(memory, memStart, memory, ++memStart, i)
            puzzle.pass()
        } else {
            puzzle.take()
            memStart += i + 1
        }
    }

    // It's broken down to last three elements
    List values = Arrays.copyOfRange(memory, 10000 - 3, 10000)
    while (!values.contains(puzzle.read())) // Skip values we can't use
        puzzle.pass()
    int value1 = puzzle.read()
    int value2 = puzzle.oracle(1)
    if (value1 == values.max() && (
            values.contains(value2)
            ? (value1 * 2 < values.sum() && values.min() == value2)
            : (value1 < values.min() / 2 + (values - [value1]).max())
            )) {
        puzzle.pass()
    }

    // Finish it
    while (puzzle.i < puzzle.values.size()) {
        puzzle.take()
    }

    puzzle.value as Long
}.sum() / attempts // Sum runs and average

I compare remaining values with possible values. This script is not fast (takes 1-minute per 1000x simulations)... but it will perform the simulations simultaneously.

I have no idea why my algorithm works, but it was just trial and error: lumping mathematical operations together and manipulating the constants. I ran it 5000x for the current score, in an attempt to reduce the score's fluctuations (it's +/- $4000 depending on iteration count).

Even without the oracle at the end, it should still be (barely) beating @orlp's solution for the prior puzzle.

Wesley Wolfe

Posted 2015-08-03T07:00:04.587

Reputation: 101

7

C# - $803.603 now -> $804.760 (with oracle)

Bootstrap Code

public static class ShuffleExtension
{
    private static Random rng = new Random();  

    public static void Shuffle<T>(this IList<T> list)  
    {  
        int n = list.Count;
        while (n > 1) {  
            n--;  
            int k = rng.Next(n + 1);  
            T value = list[k];  
            list[k] = list[n];  
            list[n] = value;  
        }  
    }
}

public class Puzzle
{
    public List<int> Values = new List<int>(10000);

    public Puzzle()
    {
        for ( int i = 1; i <= 10000; i++ )
        {
            Values.Add(i);
        }
        Values.Shuffle();
    }

    public int i = 0;
    public int value = 0;
    public int max = 0;
    public int nextOracle = 0;

    public void Pass() {
        if ( i >= Values.Count )
            throw new IndexOutOfRangeException();
        i++;
    }

    public void Take() {
        if (i >= Values.Count )
            throw new IndexOutOfRangeException();
        int v = Values[i];
        if (v > max) {
            max = v;
            value += v;
        }
        i++;
    }

    public double oracle(int m) {
    if (m <= 0) { 
        throw new IndexOutOfRangeException();
    }
    if ( i < nextOracle ) {
        throw new IndexOutOfRangeException();
    }
    if ( i + 1 + m > Values.Count ) {
        throw new IndexOutOfRangeException();
    }

    nextOracle = i + m;
    var oracleValues = new List<int>();
    for ( int l = 0; l < m; l++ )
    {
        oracleValues.Add(Values[i + 1 + l]);
    }
    return oracleValues.Average (v => v);
}

    public int Read() {
        if (i >= Values.Count )
            throw new IndexOutOfRangeException();
        return Values[i];
    }
}

Game Code:

    void Main()
{
    var m = 0;
    for ( int l = 0; l < 1000; l++ )
    {
        var game = new Puzzle();
        var maxVal = 0;
        var lastOracle = 0;
        var lastOracleValue = 0.0m;
        var oracleValueForIOf = 0;

        for ( int i = 0; i < 10000; i++ )
        {
            var val = game.Read();
            var oracleStep = 1;
            var canUseOracle = (i - lastOracle >= oracleStep) && i + oracleStep + 1 <= 10000;
            if ( canUseOracle )
            {
                var oracle = game.oracle(oracleStep);
                lastOracle = i;
                lastOracleValue = (decimal)oracle;
                oracleValueForIOf = i + 1;
            }
            if ( TakeTheMoney(val, maxVal, oracleValueForIOf, lastOracleValue, i) )
            {
                maxVal = val;
                game.Take();
            }
            else
            {
                game.Pass();
            }
        }
        m += game.value;
    }
    ((int)(m / 1000)).Dump();
}

private bool TakeTheMoney(int val, int maxVal, int oracleValueForIOf, decimal lastOracleValue, int i)
{
    if ( val > maxVal )
    {
        if ( oracleValueForIOf != i + 1
            &&
            (val < 466.7m + (0.9352m * maxVal) + (0.0275m * i))
            )
        {
            return true;
        }

        if (oracleValueForIOf == i + 1)
        {
            if ( val < 466.7m + (0.9352m * maxVal) + (0.0275m * i) )
            {
                return true;
            }
            if ( lastOracleValue > 466.7m + (0.9352m * val) + (0.0275m * i + 1) )
            {
                if ( val < 466.7m + (0.9352m * maxVal) + (0.0275m * i + 1) )
                {
                    return true;
                }
            }
        }
    }
    return false;
}

Credit belongs to Reto Koradi (https://codegolf.stackexchange.com/a/54181/30910)

Edit: Basic Use of Oracle implemented. If the next oracle is above the threshold to use expand the current envelope to the index of the Oracle Index. This doesn't hit often but IT IS an Improvement ;-)

Stephan Schinkel

Posted 2015-08-03T07:00:04.587

Reputation: 596

4I don't think it's very productive to repost solutions from the previous challenge. We all recognized that those solutions can be used as a baseline to this challenge, and I had already left a comment for the OP asking how we should handle that. The idea is that you come up with your own solution, which is ideally better than the solutions to the previous challenge. – Reto Koradi – 2015-08-03T14:26:38.800

please stop downvoting :) note number 2 was added after my submission. and as it is more effective than the other solutions - i have posted it here. no need to use oracle to beat the existing solutions. – Stephan Schinkel – 2015-08-03T15:06:19.480

@StephanSchinkel You have my upvote if you manage to include the Oracle to improve the current score. Even by just $1. – Dorus – 2015-08-03T15:24:08.863

@BetaDecay what it is exactly that is frowned upon by the community again? I just followed the question from the op. Once again Note Number 2 was added AFTER my submission. – Stephan Schinkel – 2015-08-03T15:47:03.063

To Not use a solution from part I of the quiz. – Stephan Schinkel – 2015-08-03T15:55:21.180

@Dorus: done ;) – Stephan Schinkel – 2015-08-03T16:08:23.483

4

Python - $74112

Only take, if the current value is lower than the next value (i.e. you can take both).

def algo():
  try:
    o=oracle(1)
  except ValueError:
    take()
  r=read()
  if r>o:
    passe()
  else:
    take()

Python - (still calculating the mean)

This answer takes VERY LONG to calculate. It reaches around 670.000$. I remember each envelope I saw. Every time I have to make a decision, I generate two lists of remaining envelopes that I could potentially add to my wallet if I take the current envelope or leave it respectively.

I did not optimize the code.

def algo_2():
  global max_taken, past
  weight=0.92 #Empirically chosen.
  r=read()
  if len(past)==0:
    past.append(r)
    passe()
    return
  if r<max_taken:
    past.append(r)
    take() #the same as passe
    return
  coming=[x for x in range(1,10001) if x not in past and x>max_taken and x!=r ]
  comingIfTake=[x for x in range(1,10001) if x not in past and x>r ]
  if sum(coming)*weight<=sum(comingIfTake)+r:
    past.append(r)
    take()
  else:
    past.append(r)
    passe()

And init_game starts like this:

def init_game():
    global stack, wallet, max_taken, oracle_turns, past
    past=[]

TheEspinosa

Posted 2015-08-03T07:00:04.587

Reputation: 173

3If you use sets to represent past, coming, and comingIfTake, and use intersections, your code would be much faster. – Nathan Merrill – 2015-08-03T14:37:59.157

4

C# - $780.176

Check if the next value is within the lower 5% of all remaining values. Get more relaxed as we get to the end.

public class Taker
{
    private List<int> remaining;
    private Game game;

    public Taker(Game game)
    {
        this.game = game;
        remaining = Enumerable.Range(1, game.Size + 100).ToList();
    }

    int score = 0;

    public int PlayGame()
    {
        for (int i = 0; i < game.Size; i++)
        {
            if (game.Read() < game.Max ||
                game.Read() > selectThreshold() ||
                doOracle()
                )
            {
                remaining.Remove(game.Read());
                game.Pass();
                continue;
            }
            remaining = remaining.SkipWhile(j => j < game.Read()).ToList();
            score += game.Take();
        }
        return score;
    }

    private bool doOracle()
    {
        return game.Oracle(1) < game.Read() &&
            game.Oracle(1) > game.Max;
    }

    private int selectThreshold()
    {
        int selector = (int)(remaining.Count * 0.05);
        return remaining.ElementAt(selector);
    }
}

And my game class, very ugly, game class doesn't even validate if oracle is allowed, but since i only use Oracle(1) that shouldn't be a problem.

public class Game
{
    private int[] list;
    private int position = 0;
    private int max = 0;
    public int Max { get { return max; } }
    public int Size { get { return list.Length; } }

    public Game(int[] list)
    {
        this.list = list;
    }

    public int Read()
    {
        return list[position];
    }

    public int Take()
    {
        if (list[position] < max)
        {
            position++;
            return 0;
        }
        max = list[position];
        return list[position++];
    }

    public void Pass()
    {
        position++;
    }

    public int Oracle(int M)
    {
        int next = position + 1;
        M = Math.Max(0, Math.Min(M, list.Length - next));
        return new ArraySegment<int>(list, next, M).Sum();
    }
}

Dorus

Posted 2015-08-03T07:00:04.587

Reputation: 141

4

Java, $804,991

Score is from 1001 rounds. It is probably too close to call between this answer and Stephan Schinkel's.

This is based on my answer in the previous challenge, in that it uses the same entropy-based calculation to estimate payoffs. The main difference is that it simply now takes envelopes in pairs (1 & 2, then 3 & 4, so on) and looks at the possible combinations of take-take, take-pass, pass-take, etc. It also calculates the exact estimated score when the number of valid envelopes is really small.

The "wrapper" I wrote isn't really a true wrapper, it just gives envelopes in pairs instead of calling an Oracle(1) function every other round.

Overall, I would say that, despite the increased complexity, this bot really isn't better than my previous one.

Player

import java.lang.Math;
public class Player2
{
    public int[] V;

    public Player2(int s)
    {
        V = new int[s];
        for(int i = 0; i<V.length; i++)
        {
            V[i] = i+1;
        }
        ////System.out.println();
    }

    public boolean [] takeQ(int x, int y)
    {
        //System.out.println("Look: " + x + " " + y);
        boolean [] move = new boolean[]{false,false};
        double max = 0;
        double val = 0;
        int[] nextV = V;

        ////System.out.println("look " + x);
        int i = find(V,x);
        if(i >= 0)  //if found
        {
            //try taking first envelope
            int[] newVt = takeSlice(V,i);
            //System.out.println("  T: " + ats(newVt));
            int j = find(newVt,y);
            if(j >= 0)
            {
                //try taking first and second
                int[] newVtt = takeSlice(newVt,j);
                val = x + y + calcVal(newVtt);
                //System.out.println("  TT: " + ats(newVtt) + " " + val);
                if(val > max)
                {
                    move = new boolean[]{true,true};
                    max = val;
                    nextV = newVtt;
                }
            }
            //try taking first and passing second
            int[] newVtp = passSlice(newVt,j);

            val = x + calcVal(newVtp);
            //System.out.println("  TP: " + ats(newVtp) + " " + val);
            if(val > max)
            {
                move = new boolean[]{true,false};
                max = val;
                nextV = newVtp;
            }
        }
        int[] newVp = passSlice(V,i);
        //System.out.println("  V: " + ats(V));
        //System.out.println("  P: " + ats(newVp));
        int j = find(newVp,y);
        if(j >= 0)
        {
            //try passing first and taking second
            int[] newVpt = takeSlice(newVp,j);
            val = y + calcVal(newVpt);
            //System.out.println("  PT: " + ats(newVpt) + " " + val);
            if(val > max)
            {
                move = new boolean[]{false,true};
                max = val;
                nextV = newVpt;
            }
        }
        //try taking first and passing second
        int[] newVpp = passSlice(newVp,j);

        val = calcVal(newVpp);
        //System.out.println("  PP: " + ats(newVpp) + " " + val);
        if(val > max)
        {
            move = new boolean[]{false,false};
            max = val;
            nextV = newVpp;
        }
        V = nextV;
        //System.out.println("  NEW: " + ats(V));
        return move;
    }

    public static String ats(int [] a)
    {
        String s = "";
        for(int i = 0; i < a.length; i++)
        {
            s += a[i] + ",";
        }
        return s;
    }

    public static int[] takeSlice (int[] list, int loc)
    {
        int [] newlist = new int[list.length - loc - 1];
        for(int j = loc + 1; j < list.length; j++)
        {
            newlist[j - loc - 1] = list[j];
        }
        return newlist;
    }

    public static int[] passSlice (int[] list, int loc)
    {
        int [] newlist = list;
        if(loc >= 0)
        {
            newlist = new int[list.length-1];
            for(int k = 0; k < loc; k++)
            {
                newlist[k] = list[k];
            }
            for(int k = loc + 1; k < list.length; k++)
            {
                newlist[k-1] = list[k];
            }
        }
        return newlist;
    }

    public static double calcVal(int [] list)
    {
        if(list.length < 8)
        {
            for(int i : list)
            {
                ////System.out.print(i + ",");
            }

                ////System.out.println();
            return computeMean(list);

        }
        return smoothEstimate(list);
    }

    public static double computeMean(int[] V)
    {
        if(V.length == 1)
        {
            return V[0];
        }
        else if(V.length > 1)
        {
            double[] Es = new double[V.length];
            for(int i = 0; i < V.length; i++)
            {
                int[] newVp = new int[V.length - 1];
                for(int j = 0; j < i; j++)
                {
                    newVp[j] = V[j];
                }
                for(int j = i + 1; j < V.length; j++)
                {
                    newVp[j-1] = V[j];
                }
                double pass = computeMean(newVp);
                int[] newVt = new int[V.length - i - 1];
                for(int j = i + 1; j < V.length; j++)
                {
                    newVt[j - i - 1] = V[j];
                }
                double take = V[i] + computeMean(newVt);
                if(take > pass)
                {
                    Es[i] = take;
                }
                else
                {
                    Es[i] = pass;
                }
            }
            double sum = 0;
            for(double d : Es)
            {
                sum += d;
            }
            return sum/V.length;
        }
        else
        {
            return 0;
        }
    }

    public static double smoothEstimate(int [] list)
    {
        double total = 0;
        for(int i : list)
        {
            total+=i;
        }
        double ent = 0;
        for(int i : list)
        {
            if(i > 0)
            {
                ent -= i/total * Math.log(i/total);
            }
        }
        ////System.out.println("      total " + total);
        ////System.out.println("      entro " + Math.exp(ent));
        ////System.out.println("      count " + list.length);
        return total * Math.pow(Math.exp(ent),-0.5) * 4.0/3;// * 1.1287 + 0.05284);
    }

    public static int find(int[] list, int search)
    {
        int first  = 0;
        int last   = list.length - 1;
        int middle = (first + last)/2;

        while( first <= last )
        {
            if ( list[middle] < search )
                first = middle + 1;    
            else if ( list[middle] == search )
                break;
            else
                last = middle - 1;

            middle = (first + last)/2;
        }

        if(first > last)
        {
            return -1;
        }
        return middle;
    }
}

Controller

import java.lang.Math;
import java.util.Random;
import java.util.ArrayList;
import java.util.Collections;
public class Controller2
{
    public static void main(String [] args)
    {
        int size = 10000;
        int rounds = 1001;
        ArrayList<Integer> results = new ArrayList<Integer>();
        for(int round = 0; round < rounds; round++)
        {
            int[] envelopes = new int[size];
            for(int i = 0; i<envelopes.length; i++)
            {
                envelopes[i] = i+1;
            }
            shuffleArray(envelopes);
            Player2 p = new Player2(size);
            int cutoff = 0;
            int winnings = 0;
            for(int i = 0; i<envelopes.length; i+=2)
            {
                boolean [] take = p.takeQ(envelopes[i],envelopes[i+1]);
                if(take[0] && envelopes[i] >= cutoff)
                {
                    winnings += envelopes[i];
                    cutoff = envelopes[i];
                }
                if(take[1] && envelopes[i+1] >= cutoff)
                {
                    winnings += envelopes[i+1];
                    cutoff = envelopes[i+1];
                }
            }
            results.add(winnings);
        }
        Collections.sort(results);
        System.out.println(rounds + " rounds, median is " + results.get(results.size()/2));

    }

    //stol... I mean borrowed from http://stackoverflow.com/questions/1519736/random-shuffling-of-an-array
    static void shuffleArray(int[] ar)
    {
        Random rnd = new Random();
        for (int i = ar.length - 1; i > 0; i--)
        {
            int index = rnd.nextInt(i + 1);
            // Simple swap
            int a = ar[index];
            ar[index] = ar[i];
            ar[i] = a;
        }
    }
}

Bitcoin address: 1BVBs9ZEP8YY4EpV868nxi2R23YfL7hdMq

PhiNotPi

Posted 2015-08-03T07:00:04.587

Reputation: 26 739

3

Python 3 - $615570

Doesn't actually use the oracle... Eh :)

def algo():
    global prevs

    try:
        prevs.append(read())
    except NameError:
        prevs = [read()]

    if len(prevs) > 10000:
        prevs = [prevs[-1]]

    if read() < round(len(prevs),-1):
        take()
    else:
        passe()

Builds up a list of all previous envelopes and check if the current envelope is less than the number of previous envelopes in 10 envelope increments.

Beta Decay

Posted 2015-08-03T07:00:04.587

Reputation: 21 478

0

Python, 87,424

Here's a plain and easy algorithm, the lucky seven.

def LuckyNumber7():
Test = read()
if "7" in str(Test):
    take()
else:
    passe()

test(LuckyNumber7)

Basically what it does is it converts read() to a string and checks if there's a seven in it. If there is, it takes the envelope. If not, it passes it.

It averages around 81,000, I haven't been keeping track.

The_Basset_Hound

Posted 2015-08-03T07:00:04.587

Reputation: 1 566

So this shows that relying on luck is not a successful strategy? ;) – Reto Koradi – 2015-08-07T01:10:57.877

@RetoKoradi Yep :D – The_Basset_Hound – 2015-08-07T01:11:21.147