35
7
In this challenge, you will play the noisy iterated prisoner's dilemma.
The Prisoner's dilemma is a scenario in game theory where there are two players, each with two options: cooperate, or defect. Each player does better for themself if they defect than if they cooperate, but both players would prefer the outcome where both players cooperate to the one where both players defect.
The iterated prisoner's dilemma is the same game, except you play against the same opponent repeatedly, and you know what your opponent has played in the past. Your objective is always to accumulate the highest score for yourself, regardless of how your opponent does.
The noisy iterated prisoner's dilemma introduces some noise into the communication. Your knowledge of what your opponent has played in the past will have some noise introduced. You will also know what moves you made in the past. The noise rate is constant over a round against the same opponent, but different between different rounds.
Challenge
In this challenge, you will write a Python 3 program to play the noisy iterated prisoner's dilemma.
Your program will receive three inputs:
Your own moves, without random flips applied.
Your opponent's moves, with random flips applied.
A state variable, which starts as an empty list each round, and which you can modify if you want. You can ignore this if you don't want to use it.
Your program should output 'c'
to cooperate or 'd'
to defect.
For instance, here's a program that cooperates if the opponent has cooperated at least 60% of the time in the past, after random flips were applied, and for the first 10 flips:
def threshold(my_plays, their_flipped_plays, state):
if len(their_flipped_plays) < 10:
return 'c'
opp_c_freq = their_flipped_plays.count('c')/len(their_flipped_plays)
if opp_c_freq > 0.6:
return 'c'
else:
return 'd'
If you don't know Python, write your submission in pseudocode, and someone (me or another member of the site) can make the corresponding Python program.
Gameplay
The tournament runner can be found here: noisy-game. Run noisy-game.py
to run the tournament. I'll keep that repository updated with new submissions. Example programs can be found in basic.py
.
A program's overall score is the total of its score over 100 plays of the game.
A game consists of round-robin matchups of each player against each player, including itself. A matchup consists of 100 rounds. A round consists of 300 moves, each of which involves outputting 'c'
or 'd'
.
Your submission will play a matchup against every submission, including your own. Each matchup will consist of 100 rounds. During each round, a flip probability will be chosen uniformly randomly from [0, 0.5]
.
Each round will consist of 300 moves. On each move, both programs will receive all previous plays they have attempted, and all previous plays the other program has made, after flips are applied, and a state variable, which is a mutable list which the program can modify if it wants to. The programs will output their moves.
Moves are scored as follows: If a program plays a 'c'
, the opposing program gets 2 points. If a program plays a 'd'
, that program gets 1 point.
Then, each move is flipped independently with probability equal to the flip probability, and stored for showing to the opponent.
After all of the rounds have been played, we sum the number of points each player got in each matchup. Then, we use the following scoring system to calculate each player's score for the game. This scoring is performed after all of the matchups are complete.
Scoring
We will use evolutionary scoring. Each program starts with equal weight. Then, weights are updated as follows, for 100 iterations, using the point totals from the game:
Each program's new weight is proportional to the product of its previous weight and its average point total, weighted by the weights of its opponents.
100 such updates are applied, and the final weights are each program's score for that run of the game.
The overall scores will be the sum over 100 runs of the game.
The players will be all valid answers to this challenge, plus six basic programs to get us started.
Caveats
Do not modify the inputs. Do not attempt to affect the execution of any other program, except via cooperating or defecting. Do not make a sacrificial submission that attempts to recognize another submission and benefit that opponent at its own expense. Standard loopholes are banned.
EDIT: Submissions may not exactly duplicate any of the basic programs or any earlier submission.
If you have any questions, feel free to ask.
Current results
nicht_genug: 40.6311
stealer: 37.1416
enough: 14.4443
wait_for_50: 6.947
threshold: 0.406784
buckets: 0.202875
change_of_heart: 0.0996783
exploit_threshold: 0.0670485
kickback: 0.0313357
tit_for_stat: 0.0141368
decaying_memory: 0.00907645
tit_for_whoops: 0.00211803
slider: 0.00167053
trickster: 0.000654875
sounder: 0.000427348
tit_for_tat: 9.12471e-05
stubborn_stumbler: 6.92879e-05
tit_for_time: 2.82541e-05
jedi2sith: 2.0768e-05
cooperate: 1.86291e-05
everyThree: 1.04843e-05
somewhat_naive: 4.46701e-06
just_noise: 1.41564e-06
growing_distrust: 5.32521e-08
goldfish: 4.28982e-09
vengeful: 2.74267e-09
defect: 3.71295e-10
alternate: 2.09372e-20
random_player: 6.74361e-21
Results with only answers to this question and basic programs that ignore the opponent's play:
nicht_genug: 39.3907
stealer: 33.7864
enough: 20.9032
wait_for_50: 5.60007
buckets: 0.174457
kickback: 0.0686975
change_of_heart: 0.027396
tit_for_stat: 0.024522
decaying_memory: 0.0193272
tit_for_whoops: 0.00284842
slider: 0.00153227
sounder: 0.000472289
trickster: 0.000297515
stubborn_stumbler: 3.76073e-05
cooperate: 3.46865e-05
tit_for_time: 2.42263e-05
everyThree: 2.06095e-05
jedi2sith: 1.62591e-05
somewhat_naive: 4.20785e-06
just_noise: 1.18372e-06
growing_distrust: 6.17619e-08
vengeful: 3.61213e-09
goldfish: 3.5746e-09
defect: 4.92581e-10
alternate: 6.96497e-20
random_player: 1.49879e-20
Winning
The competition will stay open indefinitely, as new submissions are posted. However, I will declare a winner (accept an answer) based on the results 1 month after this question was posted.
How does tit_for_whoops ignore the opponent's play? – LyricLy – 2018-05-05T01:07:38.913
@LyricLy I assume the category refers to the basic programs provided by Isaac which ignore their opponents. – FryAmTheEggman – 2018-05-05T01:11:12.593
tit_for_whoops
is not a basic program by Isaac, nor does it ignore the opponent. – LyricLy – 2018-05-05T01:13:54.953@LyricLy Sorry, that category is for any answers to this question, plus the opponent ignoring basic programs – isaacg – 2018-05-05T01:14:38.700
I've added a state variable. Submissions should be updated accordingly. – isaacg – 2018-05-05T01:22:37.150
Do not make a sacrificial submission that attempts to recognize another submission and benefit that opponent at its own expense. is already a standard loophole. ;) – Erik the Outgolfer – 2018-05-05T14:53:17.983
1Do I understand right that you can use the state variable to record all your moves as you submit them, and therefore know both your true moves and flipped moves, and estimate the flip probability? – xnor – 2018-05-05T16:52:19.060
1@xnor You always get told your true moves. It's only the opponents moves that may get flipped. – None – 2018-05-05T16:56:53.907
@Mnemonic Man, I read "Your own moves, without random flips applied" as "with random flips" like 3 times. Let me italicize that. – xnor – 2018-05-05T17:22:08.557
1@isaacg I tried copying
exploit_threshold()
several times asexploit_threshold1()
, etc and added them to theplayers
list. Why do I get vastly different results for identical strategies? – ngn – 2018-05-05T18:35:13.053@isaacg I updated my strategy (stubborn_stumbler) again, can the original code I submitted be replaced? – JMigst – 2018-05-05T18:58:14.023
The different results might be from having more submissions that use
random()
. – miles – 2018-05-05T19:26:42.817@ngn I can't seem to reproduce your results currently. How many copies of exploit_threshold did you make, and what were the results exactly? Note that differing results that are all under 1 are mostly result of random noise – isaacg – 2018-05-05T20:25:56.900
@EriktheOutgolfer I think the standard loophole you linked is mostly talking about joke sacrificial submissions, whereas I'm talking about sacrificial submissions that are trying to make another submission win. – isaacg – 2018-05-05T20:36:05.027
I realised my code had a bug, where I was only running the game once. I fixed it to 100 times, but then it took too long. I'm running the game 10 times now, but keeping the scale the same – isaacg – 2018-05-05T21:28:32.723
@isaacg for instance using 4 copies of
stubborn_stumbler
instead of 1, with the latest sources I get results 49, 14, 12, 1.8 (rounded) – ngn – 2018-05-05T21:56:59.400@ngn I think this was happening because I wasn't running the enough games. I said I'd do 100, I was actually doing 1, which leads to a lot of luck taking over. Now I'm doing 10, which is better. – isaacg – 2018-05-05T22:21:34.757