35
13
This task is part of the First Periodic Premier Programming Puzzle Push and is intended as demonstration of the new king-of-the-hill challenge-type proposal.
The task is the write a program to play the iterated prisoner's dilemma better than other entrants.
Look, Vinny. We know your cellmate---what's his name? Yeah McWongski, the Nippo-Irish-Ukranian mobster--is up to something and you know what it is.
We're trying to be nice here, Vinnie. Givin' you a chance.
If you tells us what he's plannin' we'll see you get a good work assignment.
And if you don't...
The Rules of the Game
- The contest consists of a full round-robin (all possible pairing) of two contestants at a time (including self plays).
- There are 100 rounds played between each pair
- In each round each player is asked to choose between cooperating with the other player or betraying them, without knowing the other players intentions in the matter, but with a memory of the outcomes of previous rounds played against this opponent.
- Points are awarded for each round based on the combined choice. If both players cooperate they each get 2 points. Mutual betrayal yields 1 point each. In the mixed case, the betraying player is awarded 4 points and the cooperator is penalized by 1.
- An "official" match will be run not sooner than 10 days after posting with all the submissions I can get to work and be used to select the "accepted" winner. I have a Mac OS 10.5 box, so POSIX solutions should work, but there are linuxisms that don't. Likewise, I have no support for the win32 API. I'm willing to make a basic effort to install things, but there is a limit. The limits of my system in no way represent the limits of acceptable responses, simply those that will be included in the "offical" match.
The Programmer's interface
- Entries should be in the form of programs that can be run from the command line; the decision must the (sole!) output of the program on the standard output. The history of previous rounds with this opponent will be presented as a command-line argument.
- Output is either "c" (for clam up) or "t" (for tell all).
- The history is a single string of characters representing previous rounds with the most recent rounds coming earliest in the string. The characters are
- "K" (for kept the faith meaning mutual cooperation)
- "R" (for rat b@st@rd sold me out!)
- "S" (for sucker! meaning you benefited from a betrayal)
- "E" (for everyone is looking out for number one on mutual betrayal)
The bracket
Four players will be provided by the author
- Angel -- always cooperates
- Devil -- always talks
- TitForTat -- Cooperates on the first round then always does as he was done by in the last round
- Random -- 50/50
to which I will add all the entries that I can get to run.
The total score will be the sum score against all opponents (including self-plays only once and using the average score).
Entrants
(current as of 2 May 2011 7:00)
The Secret Handshake | Anti-T42T Missile | Mistrust (variant) | Anti-Handshake | The Little Lisper | Convergence | Shark | Probabimatic | Pavlov - Win Stay, Lose Switch | Honor Among Thieves | Help Vampire | Druid | Little Schemer | Bygones | Tit for Two Tats | Simpleton |
Scorer
#! /usr/bin/python
#
# Iterated prisoner's dilemma King of Hill Script Argument is a
# directory. We find all the executables therein, and run all possible
# binary combinations (including self-plays (which only count once!)).
#
# Author: dmckee (https://codegolf.stackexchange.com/users/78/dmckee)
#
import subprocess
import os
import sys
import random
import py_compile
###
# config
PYTHON_PATH = '/usr/bin/python' #path to python executable
RESULTS = {"cc":(2,"K"), "ct":(-1,"R"), "tc":(4,"S"), "tt":(1,"E")}
def runOne(p,h):
"""Run process p with history h and return the standard output"""
#print "Run '"+p+"' with history '"+h+"'."
process = subprocess.Popen(p+" "+h,stdout=subprocess.PIPE,shell=True)
return process.communicate()[0]
def scoreRound(r1,r2):
return RESULTS.get(r1[0]+r2[0],0)
def runRound(p1,p2,h1,h2):
"""Run both processes, and score the results"""
r1 = runOne(p1,h1)
r2 = runOne(p2,h2)
(s1, L1), (s2, L2) = scoreRound(r1,r2), scoreRound(r2,r1)
return (s1, L1+h1), (s2, L2+h2)
def runGame(rounds,p1,p2):
sa, sd = 0, 0
ha, hd = '', ''
for a in range(0,rounds):
(na, ha), (nd, hd) = runRound(p1,p2,ha,hd)
sa += na
sd += nd
return sa, sd
def processPlayers(players):
for i,p in enumerate(players):
base,ext = os.path.splitext(p)
if ext == '.py':
py_compile.compile(p)
players[i] = '%s %sc' %( PYTHON_PATH, p)
return players
print "Finding warriors in " + sys.argv[1]
players=[sys.argv[1]+exe for exe in os.listdir(sys.argv[1]) if os.access(sys.argv[1]+exe,os.X_OK)]
players=processPlayers(players)
num_iters = 1
if len(sys.argv) == 3:
num_iters = int(sys.argv[2])
print "Running %s tournament iterations" % (num_iters)
total_scores={}
for p in players:
total_scores[p] = 0
for i in range(1,num_iters+1):
print "Tournament %s" % (i)
scores={}
for p in players:
scores[p] = 0
for i1 in range(0,len(players)):
p1=players[i1];
for i2 in range(i1,len(players)):
p2=players[i2];
# rounds = random.randint(50,200)
rounds = 100
#print "Running %s against %s (%s rounds)." %(p1,p2,rounds)
s1,s2 = runGame(rounds,p1,p2)
#print (s1, s2)
if (p1 == p2):
scores[p1] += (s1 + s2)/2
else:
scores[p1] += s1
scores[p2] += s2
players_sorted = sorted(scores,key=scores.get)
for p in players_sorted:
print (p, scores[p])
winner = max(scores, key=scores.get)
print "\tWinner is %s" %(winner)
total_scores[p] += 1
print '-'*10
print "Final Results:"
players_sorted = sorted(total_scores,key=total_scores.get)
for p in players_sorted:
print (p, total_scores[p])
winner = max(total_scores, key=total_scores.get)
print "Final Winner is " + winner
- Complaints about my horrible python are welcome, as I am sure this sucks more than one way
- Bug fixes welcome
Scorer Changelog:
- Print sorted players and scores, and declare a winner (4/29, Casey)
- Optionally run multiple tournaments (
./score warriors/ num_tournaments)
) default=1 , detect & compile python sources (4/29, Casey) - Fix particularly dumb bug in which the second player was being passed a incorrect history. (4/30, dmckee; thanks Josh)
Initial warriors
By way of example, and so that the results can be verified
Angel
#include <stdio.h>
int main(int argc, char**argv){
printf("c\n");
return 0;
}
or
#!/bin/sh
echo c
or
#!/usr/bin/python
print 'c'
Devil
#include <stdio.h>
int main(int argc, char**argv){
printf("t\n");
return 0;
}
Random
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <unistd.h>
int main(int argc, char**argv){
srandom(time(0)+getpid());
printf("%c\n",(random()%2)?'c':'t');
return 0;
}
Note that the scorer may re-invoke the warrior many times in one second, so a serious effort must be made to insure randomness of the results if time is being used to seed the PRNG.
TitForTat
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char**argv){
char c='c';
if (argv[1] && (
(argv[1][0] == 'R') || (argv[1][0] == 'E')
) ) c='t';
printf("%c\n",c);
return 0;
}
The first one that actually does something with the history.
Running the scorer on only the provided warriors yields
Finding warriors in warriors/
Running warriors/angel against warriors/angel.
Running warriors/angel against warriors/devil.
Running warriors/angel against warriors/random.
Running warriors/angel against warriors/titfortat.
Running warriors/devil against warriors/devil.
Running warriors/devil against warriors/random.
Running warriors/devil against warriors/titfortat.
Running warriors/random against warriors/random.
Running warriors/random against warriors/titfortat.
Running warriors/titfortat against warriors/titfortat.
('warriors/angel', 365)
('warriors/devil', 832)
('warriors/random', 612)
('warriors/titfortat', 652)
That devil, he's a craft one, and nice guys apparently come in last.
Results
of the "official" run
('angel', 2068)
('helpvamp', 2295)
('pavlov', 2542)
('random', 2544)
('littleschemer', 2954)
('devil', 3356)
('simpleton', 3468)
('secrethandshake', 3488)
('antit42t', 3557)
('softmajo', 3747)
('titfor2tats', 3756)
('convergence', 3772)
('probabimatic', 3774)
('mistrust', 3788)
('hyperrationalwasp', 3828)
('bygones', 3831)
('honoramongthieves', 3851)
('titfortat', 3881)
('druid', 3921)
('littlelisper', 3984)
('shark', 4021)
('randomSucker', 4156)
('gradual', 4167)
Winner is ./gradual
2If my cellmate is Nippo-Irish-Ukrainian, why does his name look Hiberno-Sino-Russian? – Peter Taylor – 2011-04-29T22:32:56.443
Is it allowed to submit a strategy similar or identical to one of your "default warriors"? I.e., can I submit Tit-for-tat myself? (Cf. Rapaport in Axelrod's IPD tourney.) Or is this just boring? – jscs – 2011-04-29T22:45:51.517
2@Peter: LOL. The truth? Well, (1) the genealogies aren't clear, but I probably come by my mic'edness by way of the Scotch-Irish; (2) after I'd written "Nippo" I tried out various bits of the names of my friends from the land of the rising sun and didn't like the way they scanned, so I went ahead and used a Chinese surname that sounded good instead, and (3) I wouldn't know the difference if they took turns beating me with tire irons. Which seems likely under the circumstances. – dmckee --- ex-moderator kitten – 2011-04-29T22:51:57.503
@Josh: ::shrug:: I haven't thought about it. I included it in the defaults because it is a well known simple-but-strong player. You'll note that Joey has submitted a nearly as simple variant, and that suggests one way to go forward. – dmckee --- ex-moderator kitten – 2011-04-29T22:54:35.820
@dmckee What's the policy on multiple submissions? – Casey – 2011-04-29T23:05:03.397
@Casey: Make sure they are distinct, but go ahead. – dmckee --- ex-moderator kitten – 2011-04-29T23:06:43.957
Hey guys, I've been hacking a lot on the scoring script. In particular I've added python and lisp compilation to speed things up, as well as parallelized the matches according to your cpu count. I've added some of the changes to the score script in @dmckee's post, but I don't want to intrude to much as not everyone may want these changes. Here is a link to the latest version.
– Casey – 2011-04-30T03:40:27.703Okay, I really did find a bug this time. I'm not sure what causes it, possibly something weird about tuple assignment. I was running all-C against all-D and displaying the history. All-D's history v. all-C was wrong; it always had 'S' in the most recent position, and 'R' for the rest. The fix against the version in question body as of this comment (wish I didn't have to do this in a comment): Rewrite the following lines: 35:
return (s1, L1), (s2, L2)
41:(na, nha), (nd, nhd) = runRound(p1,p2,ha,hd)
Add two lines after 43:ha = nha + ha
andhd = nha + hd
– jscs – 2011-04-30T07:19:09.460Sorry, for "all-C", read "Angel" and for "all-D" read "Devil". Using the old terminology. – jscs – 2011-04-30T07:42:57.547
@Josh Hm, there is definitely something screwy there, but I don't think your changes fix it.. shouldn't you expect to see all 'S' for the devil when he is playing the angel? – Casey – 2011-04-30T18:24:48.013
@Casey: That's what I'm seeing here with my change. Should I make a suggested edit to the post to make it more clear? – jscs – 2011-04-30T19:06:18.810
1@Josh: Would it be simple to change
return (s1, L1+h1), (s2, L2+h1)
toreturn (s1, L1+h1), (s2, L2+h2)
[NoteL2+h2
instead ofL2+h1
at the end]? //Cut-n-paste mistake or something equally idiotic. Sheesh! – dmckee --- ex-moderator kitten – 2011-05-01T00:01:21.330@dmckee: I was hopping you would set a random number of rounds :) – Eelvex – 2011-05-01T01:46:45.740
@Eelvex: That's just the kind of input I could have used when I was tinkering with this in the Sand Box.
– dmckee --- ex-moderator kitten – 2011-05-01T02:19:01.840@dmckee: I know but, unfortunately, wasn't available till now :( – Eelvex – 2011-05-01T02:44:39.630
@dmckee: That looks like the bug! Sheesh, I haven't gotten enough sleep lately or something. – jscs – 2011-05-01T03:03:13.457
2
I've spent some time on the test script, and I'm pleased to announce an update here. This update adds a simple shell to the test script, which allows the user to manually run this bot vs. that bot, run tournaments with restricted fields and some other cool stuff. Feel free to make suggestions! Oh. And I owe @josh for the bot-vs-bot idea. It's really just a fancier implementation of his "trainer" script.
– arrdem – 2011-05-01T04:22:07.837I've just posted a half-dozen new warriors. If anyone's concerned that I am "flooding the field" or something like that, I'm perfectly okay with only my first two or three entries going in to the actual tournament. I'm just having fun coming up with ideas. – jscs – 2011-05-01T08:46:18.190
@rmckenzie, I was thinking of doing that, so thanks for doing it first. – Peter Taylor – 2011-05-01T11:25:29.597
@mckenzie, if I run your score.py with one argument I get a warning about NUM_ROUNDS being assigned to before global declaration and an NameError because num_games is not defined. – Peter Taylor – 2011-05-01T13:54:17.663
1@Josh: My only concern is that running the whole round-robin goes by O(N^2). I'm prepared to let the "official" math run for a while, but at some point it could get to be ridiculous. – dmckee --- ex-moderator kitten – 2011-05-01T13:56:47.153
@peter - thanks for pointing that out, I got it fixed. Updated on github. – arrdem – 2011-05-01T14:25:07.360
@rmckenzie Nice changes! Wanna make a pull request on github? – Casey – 2011-05-01T16:17:02.460
@mckenzie, I tried implementing the -v option of match by making a copy of runGame which adds
print "\n", ha, "\n", hd
after the line which prints the score. If I run angel vs devil for a 5-round match I getRRRRR
for the angel (correct) andSRRRR
for the devil (should beSSSSS
). The last line ofrunRound
has a bug. Should bereturn (s1, L1+h1), (s2, L2+h2)
– Peter Taylor – 2011-05-01T16:46:33.5931
@Peter It is fixed in the most recent commit
– Casey – 2011-05-01T17:03:35.727@dmckee: I certainly understand that. At this point, I just walk away from the computer every time I run a tournament! – jscs – 2011-05-01T17:24:17.143
@dmckee: What was your reason for only counting the mirror match once? To reduce benefits of collusion? Not complaining, just curious. Every other match gets run twice. E.g., in a tourney of Angel and Devil, we have A v. D and D v. A. – jscs – 2011-05-01T17:28:44.580
@peter - Thanks. I just saw that... turns out my Lisper was doing so well in part because of that bug sigh but yeah, casey's repo and mine both have that fixed. @josh - I think that's a bug in the iteration generator. A v. D == D v. A, so there's no reason to run that round. Fixing that would halve the runtime.... – arrdem – 2011-05-01T17:39:04.013
@Josh, my scorer only runs one of
A vs. D
orD vs. A
. Look at how the loops index. The reason for that was to keep the time down as rmckenzie says. If I was running both versions of the non-identical pairing then you are right: I would need to count both sides of the self-plays. – dmckee --- ex-moderator kitten – 2011-05-01T20:12:06.233@dmckee: I must've knocked something out of whack when I made the ProvingGrounds script. My bug. Is your version still official? Are you planning to use either Casey or rmckenzie's changes? – jscs – 2011-05-01T20:27:29.520
@Josh some of Casey's changes have been applied to the version in the post here. I have not been watching the other versions very closely. I was going to grab the pre-compiling bits for the "official" version. Note that I keep putting "official" in quotes, 'cause I figure anyone can set a bounty, run their own version, and declare their own winner. Maybe a multi-round elimination tournament to support multiple matches and random numbers of rounds (since the "official" version is very much at he mercy of chance due to the randomness employed by some warriors and the short length). – dmckee --- ex-moderator kitten – 2011-05-01T20:32:14.743
That said, I'm sticking by the posted rules for the purposes of the green check-mark. – dmckee --- ex-moderator kitten – 2011-05-01T20:33:07.193
I'm voting for every entry as soon as I get it working on my machine. Gotta do Josh's new one soon. – dmckee --- ex-moderator kitten – 2011-05-01T20:33:39.033
Yet another update to the scoring code if you care... added some population manipulation controls. same github
– arrdem – 2011-05-02T00:38:50.750I am seeing extremely incorrect round-by-round score results using the @Casey/@rmckenzie scorer. It's most noticeable using Angel or Devil and a bot that won't defect first. Examples: Devil v. Shark (200, 200), should not be a C-tie; Handshake v. Pavlov (200, 200), likewise; Angel v. Pavlov (103, 98), should be a C-tie; Angel v. Probabimatic (400, -100), just plain impossible. I would guess it's the threading screwing up just the printing, except that the final results of the tournaments are also off somewhat compared to what I'm seeing with the "official" scorer. Anyone else seeing this? – jscs – 2011-05-02T06:41:43.543
@Josh, I'm not seeing that precisely, but some results aren't what I'd expect. E.g. handshake beats anti-handshake 109-94 (although if either has been updated recently I'm out of date). – Peter Taylor – 2011-05-02T10:22:05.887
@josh - I thought I was seeing funny results but wrote it off as just me. thanks for the heads up. will investigate those cases as soon as I have time. – arrdem – 2011-05-02T10:42:36.347
Okay. for a 1v1 with debugging on, shark vs. devil shows this:devil EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEES shark EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEER. Looks fine. Angel-pavlov is also a c-tie as it should be. no unexpected results whatsoever. – arrdem – 2011-05-02T11:50:48.830
Okay. Last post in defense of the new scoring code, I swear. Running the same round-robin tests on the "official" code above and on the casey/rmckenzie scorer, the average difference in the performance of individual strategies is 18 points in 2300 point average scores. The population was constant between tests. – arrdem – 2011-05-03T20:11:16.337
I think there must be some system difference. The official results seem to agree with @rmckenzie, but I see about the same results with the official scoring code as with his scorer - Random Sucker wins, gradual comes second, then Wasp, Druid and Shark fight it out for third. Anyone got any ideas? The best that I can speculate is a difference in random number quality between python implementations. – Peter Taylor – 2011-05-15T22:09:01.897
@Peter: The full set includes several warriors that use randomness. To test for consistency of scorers we should test with the set of deterministic warriors. I make that to be: angle antiT4T bygones devil druid gradual helpvamp honor.. hyperrat.. littlelisper mistrust pavlov proba.. secret.. shark softmajo T42T T4T. I'll run mine now. – dmckee --- ex-moderator kitten – 2011-05-15T22:14:22.203
@dmckee, ok. I'll leave it running 5 tournament iterations overnight and post a summary of results tomorrow. – Peter Taylor – 2011-05-15T22:22:44.037
@dmckee - good point. I'll do my own test just for grins. – arrdem – 2011-05-16T01:28:29.377
In 5 runs, Wasp is always first, Gradual is always second, T42T and Mistrust are 3rd and 4th in some order, T4T is 5th. There is some slight fluctuation, so unless one of them is in fact partially random there's a bug somewhere. Edit: of course, handshake now generates a random secret. – Peter Taylor – 2011-05-16T06:08:37.717
1Peter: There are plenty of strategies in the field that don't behave deterministic. And they can easily affect the score of the top scorers. – Joey – 2011-05-22T18:13:32.470
I am relieved to see that Help Vampire came in second to last, only above Angel! Thanks for running this tournament, dmckee; it was fun! – jscs – 2011-06-01T22:04:50.523
@Josh: It was fun, and most of the credit goes to the many and enthusiastic participants. – dmckee --- ex-moderator kitten – 2011-06-01T22:43:06.963
2Interesting: There were 23 contestents, so each played 22 rounds. If everyone had played "Angel" every score would have been 4400, but even the best score of 4167 did not match that. If only we lived in a perfect world... :) – Briguy37 – 2011-09-09T18:19:09.880