15
6
In this question, a game was devised in which players would face each other off pair by pair in the Prisoner's Dilemma, to determine which iterative strategy scored the highest against others.
In this question, I devised a way for multiple people to play the Prisoners' Dilemma against each other all at the same time. In this variation, the payoff matrix is unnecessary, with each outcome between each pair of two players being the sum of two functionally independent decisions.
Your task is to build an AI to play this symmetric, generalized version of the multiplayer Prisoner's Dilemma that will achieve the highest score possible.
Rules of the Game
In each round of this multiplayer, multi-round Prisoner's Dilemma, a player A
can decide to "take 1" from some other player B
. In this circumstance, A
's score increases by 1, while B
's score decreases by 2. This decision is allowed to happen between each ordered pair of players.
This is the only decision made for each player – either to "take 1" or not to "take 1" from each other player, which are homologous to defection and cooperation respectively. The effective payoff matrix between two players P1
and P2
looks as follows:
P1/P2 P1 Take1 P1 Don't
P2 Take1 -1/-1 -2/+1
P2 Don't +1/-2 0/ 0
Tournament Procedure
The game will consist of P * 25
rounds, where P
is the number of participating players. All players start with a score of 0
. Each round will consist of the following procedure:
At the beginning of a round, each program will be given a history of the previous rounds from standard input, in the following format:
One line containing 3 numbers,
P
,D
, andN
.P
is the total number of players in the game. Each player is randomly assigned an ID number from1
toP
at the beginning of the game.D
is the ID of the current player.N
is the number of rounds that have been played.
N
lines, each line representing the outcomes of a round. On linek
ofN
, there will be some numbern_k
of ordered pairs(a, b)
, separated by spaces, which represent that the player with IDa
"took 1" from the player with IDb
in that round.A uniformly random number
R
from0
to18446744073709551615
(264 - 1), to act as a pseudorandom seed. These numbers will be read from a pre-generated file, which will be released at the end of the tournament so that people can verify the results for themselves.One extra line that represents some form of state to be read into your program, if your program produced such an output in the previous round. At the beginning of the game, this line will always be empty. This line will not be modified by either the scoring code or other programs.
Each program will then use its strategy to produce the following to standard output:
A list of
K
numbers, which are the IDs of the programs it will "take 1" from this round. An empty output means it will do nothing.Optionally, one extra line representing some form of state to pass on to later rounds. This exact line will be fed back to the program in the next round.
Below is an example input for the beginning of the game for a player of ID 3
in a 4-player game:
4 3 0
4696634734863777023
Below is an example input for the same game with a few rounds already played:
4 3 2
(1, 2) (1, 3) (1, 4) (4, 2)
(1, 3) (2, 1) (2, 4) (3, 1) (4, 1)
4675881156406346380
Each program will be fed exactly the same input for a round except for the ID number D
which is unique to each program.
Below is an example output in which player 3
takes 1 from everybody else:
1 2 4
At the end of all the required rounds, the player with the highest final score will be the winner.
Timeline
The coding for this tournament will last for a total of 7 days. The deadline for submissions is 2014-05-09 00:00 UTC
.
Do not post actual programs before this date – post the SHA256 hash of the source code of your program as a commitment. You may change this hash any time before the deadline, but commitments posted after the deadline will not be accepted for judgment. (Please use base 64 notation for your hashes, as my verification program spits out base 64 and it's a more compact notation.)
After the deadline is over, you will have 1 day (until 2014-05-10 00:00 UTC
) to post the actual source code of your program for your submission. If the SHA256 hash of your posted source code does not match any hash that you posted before the deadline, your code will not be accepted into the tournament.
After this, I will download all the submissions onto my own computer, and run all the tournament entries in this battle royale, hopefully posting the results within 2 days from then, by 2014-05-12 00:00 UTC
.
I will accept the answer with the highest score, and award a bounty of +100 to that answer if its final score is greater than 0
.
After the tournament is over, I will post the random seed file used to run the competition, and people may start posting other solutions trying to top the ones used in the tournament. However, they will not count for acceptance or the bounty.
The Host Machine
I will be running these solutions on a virtual machine on my computer. This virtual machine will run Ubuntu Linux 14.04, with 2 gigabytes of RAM. My base machine has an Intel i7-2600K processor running at 3.40 GHz.
Requirements
Your program must be written in a language for which a compiler or interpreter that will compile your program exists and is readily available for the latest version of Ubuntu Linux, so that I can run all the submissions and judge them in a virtual machine.
Your program must not take more than 2.000 seconds
to run each round. If your program runs out of time or produces an error, its output will be considered empty for that round.
Your program must be deterministic; that is, it must always return the same output for the same input. Pseudorandom solutions are allowed; however, their randomness must depend on the random seed given to it as input and nothing else. The seed file was generated using Python's os.urandom
. It contains a total of 500 lines (more will be generated if necessary), and its SHA256 hash is K+ics+sFq82lgiLanEnL/PABQKnn7rDAGmO48oiYxZk=
. It will be uploaded here once the tournament is over.
Plants
To kick things off, there will be four "plants", representing initial naïve strategies. These will be playing in the tournament along with your submissions. However, in the unlikely case that one of them wins, the highest score obtained by a player other than a plant will be considered the winner.
To calculate the hash of each plant's file, replace every group of 4 spaces with a tab, since the formatter here doesn't seem to like tab characters.
The Lazy — never does anything.
n1bnYdeb/bNDBKASWGywTRa0Ne9hMAkal3AuVZJgovI=
pass
The Greedy — always takes 1 from everybody else.
+k0L8NF27b8+Xf50quRaZFFuflZhZuTCQOR5t5b0nMI=
import sys
line1 = sys.stdin.readline()
n = [int(i) for i in line1.split()]
for i in range(n[0]):
if i+1 != n[1]:
print i+1,
print
The Wrathful — takes 1 from everybody else on the first round, and takes 1 from everybody who took 1 from it the previous round afterwards.
Ya2dIv8TCh0zWzRfzUIdFKWj1DF9GXWhbq/uN7+CzrY=
import sys
import re
line1 = [int(i) for i in sys.stdin.readline().split()]
players = line1[0]
pid = line1[1]
rounds = line1[2]
lines = []
if rounds == 0:
for i in range(players):
if i+1 != pid:
print i+1,
print
else:
for i in range(rounds):
lines.append(sys.stdin.readline())
lastline = lines[-1]
takes = re.findall(r'\([0-9]+, [0-9]+\)', lastline)
for take in takes:
sides = [int(i) for i in re.findall(r'[0-9]+', take)]
if sides[1] == pid:
print sides[0],
print
The Envious — takes 1 from the 50% of players with the current highest score excluding itself, rounding down.
YhLgqrz1Cm2pEcFlsiIL4b4MX9QiTxuIOBJF+wvukNk=
import sys
import re
line1 = [int(i) for i in sys.stdin.readline().split()]
players = line1[0]
pid = line1[1]
rounds = line1[2]
lines = []
scores = [0] * players
if rounds == 0:
for i in range(players):
if i+1 != pid:
print i+1,
print
else:
for i in range(rounds):
takes = re.findall(r'\([0-9]+, [0-9]+\)', sys.stdin.readline())
for take in takes:
sides = [int(i) for i in re.findall(r'[0-9]+', take)]
scores[sides[0] - 1] += 1
scores[sides[1] - 1] -= 2
score_pairs = [(i+1, scores[i]) for i in range(players)]
score_pairs.sort(key=lambda x:(x[1], x[0]))
score_pairs.reverse()
taken = 0
j = 0
while taken < (players) / 2:
if score_pairs[j][0] != pid:
print score_pairs[j][0],
taken += 1
j += 1
In a tournament of 100 rounds just amongst these four, they receive scores of:
Lazy: -204
Greedy: -100
Wrathful: -199
Envious: -199
Judging Program
I've posted the judge program I'll be using at Github. Download it and test it out. (And maybe fix a bug or two if you find one. :P)
It doesn't have compilation options for anything other than Python at the moment. I'll be including those later - if people could contribute compilation or interpretation scripts for other languages, I'd be much obliged.
Phase 2: Source Code Submission
I've posted a new branch tournament
to the Github repository for the contest, containing the pd_rand file and other plant entries. You can either post your source code here or submit it to that branch as a pull request.
The order of the contestants will be as follows:
'begrudger'
'regular'
'patient'
'lazy'
'backstab'
'bully'
'lunatic'
'envious'
'titfortat'
'greedy'
'wrathful'
'judge'
'onepercent'
Final Scores
The output of my testing program:
Final scores:
begrudger -2862
regular -204
patient -994
lazy -2886
backstab -1311
bully -1393
lunatic -1539
envious -2448
titfortat -985
greedy -724
wrathful -1478
judge -365
onepercent -1921
Rankings:
1. regular -204
2. judge -365
3. greedy -724
4. titfortat -985
5. patient -994
6. backstab -1311
7. bully -1393
8. wrathful -1478
9. lunatic -1539
10. onepercent -1921
11. envious -2448
12. begrudger -2862
13. lazy -2886
So it turns out that the winner is indeed a player - it's The Regular, with -204 points!
Unfortunately, its score wasn't positive, but we can hardly expect that in a simulation of the Iterated Prisoner's Dilemma where everybody's playing to win.
Some surprising results (at least that I thought were surprising):
The Greedy scored more than Tit for Tat, and in fact, generally higher than most scorers at all.
The Judge, which was meant to be a sort of "morality enforcer" character (it basically took 1 from whoever had taken 1 from anybody an above-average number of times) ended up scoring rather high, while in simulation testing, it would actually get a rather low score.
And others that (I thought) weren't so surprising:
The Patient scored a full 484 points more than The Wrathful. It really pays to cooperate that first time.
One Percent very quickly had almost nobody to kick while they were down. Seems that the 1% is only able to stay that way because they have more players in the game.
Anyway, now that the tournament is over, feel free to post as many extra players as you'd like, and test around with them using the judge program.
I've named all the plants after vices, since I expect that this competition will be more vicious than the previous one, and I thought I'd theme the plants' names to reflect that. – Joe Z. – 2014-05-01T18:59:24.903
3Would posting the source to the control program and/or plants hurt anything? We know what they do anyway, and I'd prefer to be able to test against something without writing five extra programs. – Geobits – 2014-05-01T19:12:38.900
I suppose not. Lemme edit them in right now. – Joe Z. – 2014-05-01T19:13:28.473
Can we have a little more time to submit the actual code? If I'll be busy for those twenty-four hours, am I disqualified? – Ypnypn – 2014-05-01T19:35:51.920
Should the output be terminated with
\n
,\0
, something else, nothing, or does it matter? – Geobits – 2014-05-01T19:53:47.030@ypnypn: When you submit your code, email your source code to me and I'll edit it into your answer for you during the code-submission period. – Joe Z. – 2014-05-01T21:09:44.687
@Geobits: The terminated output doesn't matter, as long as it's a list of numbers separated by spaces. – Joe Z. – 2014-05-01T21:11:18.843
>
@PleaseStand 1) To prevent collusion between solutions by the same user, each contestant may only enter one program for the actual tournament. However, in the after-tournament stage, feel free to post as many alternate solutions as you want, as long as they are distinct strategies. 2) Your program will get a different random value of
R
each round, but each program will get the same value ofR
for that round. 3) I haven't written the official scorer's code yet, so you'll have to write your own until I get that done. – Joe Z. – 2014-05-01T22:44:13.440start="4">
So because R changes each round, if my program were to choose one of several strategies at random, it would have to "forget" which one it picked, instead having to infer that from the history? – PleaseStand – 2014-05-01T22:57:37.520
I've edited in something that allows you to pass one line of data onto the next round. Does that solve your problem? – Joe Z. – 2014-05-01T23:49:04.553
There are going to be a lot of upvotes for answers once the deadline passes, aren't there? – Kevin – 2014-05-02T01:17:18.790
Devil's advocate question: suppose I write a solution which seeds it's PRNG from R during the first round but passes the state of the PRNG forward from round to round (rather than re-seeding with the new R). Still "deterministic" under the rules? – dmckee --- ex-moderator kitten – 2014-05-02T01:21:20.973
@dmckee: Yes, that is valid, as it will still always return the same output for the same input. The person who suggested that rule suggested it for exactly the concern you mentioned, so it is definitely allowed. – Joe Z. – 2014-05-02T01:23:37.177
@Kevin: Yeah, most likely. – Joe Z. – 2014-05-02T01:25:47.093
2I don't understand. Is there some sort of penalty for everyone taking 1 all the time? Wouldn't it be the most advantageous to always take 1? – DankMemes – 2014-05-02T03:42:02.443
@ZoveGames It would be, but then everybody would use that strategy and no points would ever be gained. – cjfaure – 2014-05-02T10:02:37.157
How would we test our programs (say, against the plants)? – cjfaure – 2014-05-02T10:07:43.580
I'm writing a judge program for that right now. Sit tight. – Joe Z. – 2014-05-02T13:17:14.850
@JoeZ. Kk. Working on my answer now. – cjfaure – 2014-05-02T13:48:26.283
When will player ID assignments (which could potentially affect program behavior) be posted? At 2014-05-09 00:00 UTC (the beginning of the source code posting window)? – PleaseStand – 2014-05-02T15:19:20.100
@PleaseStand Yes. If I do it before then, the number of entrants might change. (And if I do it after then, I might accidentally run the tests more than once and bias myself toward a certain result.) – Joe Z. – 2014-05-02T16:18:16.533
My concern is then with "If the SHA256 hash of your posted source code does not match any hash that you posted before the deadline [...]". The hash should match the last one posted before the deadline so the player who posts his or her source code last cannot choose which program to post based on the player ID assignments (e.g. one that assumes Tit-for-tat has ID 1). – PleaseStand – 2014-05-02T16:30:26.193
I did the "any hash" thing so that people who discover errors in their latest revision could revert to an earlier one. – Joe Z. – 2014-05-02T16:35:06.650
Also, remember that all edits are logged, and they need to be in order to count. If a person posts 30 edits within 5 minutes of each other in order to have a choice in IDs, 1) they won't all show up in the edit history because of the grace period, and 2) it would definitely be very suspicious and I'd probably ask them about what those 30 edits in a row were all about. – Joe Z. – 2014-05-02T16:37:33.373
@JoeZ. Is it okay to have multiple entries here? – cjfaure – 2014-05-03T11:08:13.423
@Trimsty No.
– Joe Z. – 2014-05-03T12:55:45.433@JoeZ. Oh, okay, sorry, didn't see that comment for some reason. – cjfaure – 2014-05-03T13:01:01.777
It's alright. You can post all the alternate entries you want after the tournament is over, by the way. – Joe Z. – 2014-05-03T13:01:52.240
@JoeZ. Cool. Thanks. – cjfaure – 2014-05-03T13:06:17.177
I don't understand. How can The Greedy lose under the game rules? – justhalf – 2014-05-09T06:48:32.367
@justhalf The Greedy doesn't maximize damage to the opponent, or try to mess with other AIs. For instance, a game with only Greedy bots can never gain points. – cjfaure – 2014-05-09T14:57:16.610
1How can greedy not maximizing damage? If we take from other player, the other player can only get -1 or -2, while if we don't take, the other player can get 1 or 0. Obviously taking 1 from other player will maximize the damage. And so Greedy will never lose. And it will almost always win, unless all the opponent are also Greedy, like you said. – justhalf – 2014-05-09T15:27:58.907
@justhalf A critical part of this competition is that you're not actually allowed to submit a Greedy. You have to find something else that works. – cjfaure – 2014-05-09T17:43:36.997
@Trimsty It doesn't look like anyone did submit one, but I don't see that in the rules. I thought about doing it myself, but went with something else out of curiosity's sake. I agree that being Greedy is probably the best overall strategy if your goal is to look out for yourself. We don't care about the overall point total for this challenge. – Geobits – 2014-05-09T20:03:22.040
@Geobits I don't see how people submitting duplicates would be allowed, the goal is to make an ai, not choose one :P – cjfaure – 2014-05-09T20:07:31.943
1@Trimsty When the challenge first went up, the code for the plants was not shown. Through the entire coding phase, we could not see other answers. Dupes could have happened purely by accident by choosing the very obvious greedy strategy. – Geobits – 2014-05-09T20:09:41.543
Will my "Sort the set of victims and remove duplicates" change to the scorer be used in the tournament? – PleaseStand – 2014-05-09T23:50:51.593
Actually I don't understand this challenge as a challenge. It's more like a playground to me where people can submit answers to play. Because there is absolutely no dilemma here, according to the rules. Each time you "take 1", overall point will decrease by 1 and your point increase by 1. If you don't, overall point will stay the same. So if you want to maximize your points, always take 1. If you want to maximize overall points, never take 1. It's as simple as that. Unless you can come up with some interesting winning criteria, I don't see anything interesting in this challenge. – justhalf – 2014-05-10T01:24:56.383
2
@justhalf If you've actually read any resarch on strategies in the iterated prisoner's dilemma, you'd know what you're saying is false. The Wikipedia article is a good place to start.
– Joe Z. – 2014-05-10T14:30:21.9701" If the game is played exactly N times and both players know this, then it is always game theoretically optimal to defect in all rounds.". So it proofs my assertion? – justhalf – 2014-05-11T02:43:32.157
1That is the Nash equilibrium. The optimal strategy in practice is different. – Joe Z. – 2014-05-11T02:51:39.053