Skip to article frontmatterSkip to article content

Repeated Games

Motivating Example: Construction Contractors

Consider two local construction firms, Firm A and Firm B, who regularly bid on municipal infrastructure projects.

Each quarter, the city may announce a new contract for firms to bid on — a school, a road, or a public facility. These firms can choose to:

If both firms bid high, they enjoy good margins and may take turns winning contracts. If one undercuts while the other bids high, the undercutter wins the project and earns a higher profit, at the cost of undermining trust. If both bid low, a price war ensues, shrinking profits for both.

However, the number of projects is not fixed in advance. After each bidding round, there is a probability δ(0,1)\delta \in (0, 1) that another project becomes available. Thus, the game is repeated probabilistically, with continuation probability δ\delta. The expected number of repetitions is 11δ\frac{1}{1 - \delta}.

Each firm chooses an action in each round:

The one-shot payoff matrix is given by:

Mr=(3051)Mc=(3501)M_r = \begin{pmatrix} 3 & 0\\ 5 & 1 \end{pmatrix} \qquad M_c = \begin{pmatrix} 3 & 5\\ 0 & 1 \end{pmatrix}

This is a classic example of a Prisoner’s Dilemma: short-term incentives tempt each player to defect (bid low), but long-term cooperation (bid high) could yield better payoffs.

The remainder of this chapter will explore:

Theory

Definition: repeated game

Given a two player game (A,B)Rm×n2(A,B)\in\mathbb{R}^{{m\times n}^2}, referred to as a stage game, a TT-stage repeated game is a game in which players play that stage game for T>0T>0 repetitions. Players make decisions based on the full history of play over all the repetitions.

Example: Counting leaves in repeated games

Consider the following stage games and values of TT. How many leaves would the extensive form representation of the repeated game have?

Mr=(1223)Mc=(2311)T=2M_r = \begin{pmatrix}1 & 2 \\ 2 & 3\end{pmatrix} \qquad M_c = \begin{pmatrix}2 & 3 \\ 1 & -1\end{pmatrix} \qquad T = 2

The initial play of the game has 4 outcomes (2 actions each). Each of these leads to 4 more in the second stage. Total: 4×4=164 \times 4 = 16 leaves.

Mr=(0113)Mc=MrT=2M_r = \begin{pmatrix}0 & 1 \\ -1 & 3\end{pmatrix} \qquad M_c = - M_r \qquad T = 2

Same as (1): 4×4=164 \times 4 = 16 leaves.

Mr=(0113)Mc=MrT=3M_r = \begin{pmatrix}0 & 1 \\ -1 & 3\end{pmatrix} \qquad M_c = - M_r \qquad T = 3

Three stages of 2 actions each: 4×4×4=644 \times 4 \times 4 = 64 leaves.

Mr=(014113)Mc=MrT=2M_r = \begin{pmatrix}0 & 1 & 4\\1 &-1 & 3\end{pmatrix} \qquad M_c = - M_r \qquad T = 2

First stage: 2×3=62 \times 3 = 6 outcomes. Each of those leads to 6 in the second stage. Total: 6×6=366 \times 6 = 36 leaves.

Definition: Strategies in a repeated game

A strategy for a player in a repeated game is a mapping from all possible histories of play to a probability distribution over the action set of the stage game.

Example: Validity of repeated game strategies

For the Coordination Game with T=2T=2 determine whether the following strategy pairs are valid, and if so, what outcome they lead to.

  1. Row player:
(,)C(S,S)C(S,C)C(C,S)S(C,C)S\begin{align*} (\emptyset, \emptyset) &\to C\\ (S, S) &\to C\\ (S, C) &\to C\\ (C, S) &\to S\\ (C, C) &\to S\\ \end{align*}

Column player:

(,)S(S,S)C(S,C)C(C,S)S(C,C)S\begin{align*} (\emptyset, \emptyset) &\to S\\ (S, S) &\to C\\ (S, C) &\to C\\ (C, S) &\to S\\ (C, C) &\to S\\ \end{align*}

Valid strategy pair. Outcome: (3,2)(3,2) (corresponds to O9O_9 in the extensive form).

  1. Row player:
(,)C(S,S)C(C,S)S(C,C)S\begin{align*} (\emptyset, \emptyset) &\to C\\ (S, S) &\to C\\ (C, S) &\to S\\ (C, C) &\to S\\ \end{align*}

Column player:

(,)S(S,S)C(S,C)C(C,S)S(C,C)S\begin{align*} (\emptyset, \emptyset) &\to S\\ (S, S) &\to C\\ (S, C) &\to C\\ (C, S) &\to S\\ (C, C) &\to S\\ \end{align*}

Invalid: the row player does not define an action for history (S,C)(S, C).

  1. Row player:
(,)C(S,S)C(C,S)S(S,C)S(C,C)S\begin{align*} (\emptyset, \emptyset) &\to C\\ (S, S) &\to C\\ (C, S) &\to S\\ (S, C) &\to S\\ (C, C) &\to S\\ \end{align*}

Column player:

(,)S(S,S)C(S,C)C(C,S)α(C,C)S\begin{align*} (\emptyset, \emptyset) &\to S\\ (S, S) &\to C\\ (S, C) &\to C\\ (C, S) &\to \alpha\\ (C, C) &\to S\\ \end{align*}

Invalid: column player uses an invalid action α\alpha.

  1. Row player:
(,)S(S,S)C(C,S)S(S,C)C(C,C)S\begin{align*} (\emptyset, \emptyset) &\to S\\ (S, S) &\to C\\ (C, S) &\to S\\ (S, C) &\to C\\ (C, C) &\to S\\ \end{align*}

Column player:

(,)S(S,S)C(S,C)C(C,S)S(C,C)S\begin{align*} (\emptyset, \emptyset) &\to S\\ (S, S) &\to C\\ (S, C) &\to C\\ (C, S) &\to S\\ (C, C) &\to S\\ \end{align*}

Valid strategy pair. Outcome: (5,5)(5,5) (corresponds to O4O_4 in the extensive form).

Theorem: Subgame perfection of sequence of stage Nash profiles


For any repeated game, any sequence of stage Nash profiles gives the outcome of a subgame perfect Nash equilibrium.


Where by stage Nash profile we refer to a strategy profile that is a Nash Equilibrium in the stage game.

Proof


If we consider the strategy given by:

“Player ii should play strategy s~i(k)\tilde s^{(k)}_i regardless of the play of any previous strategy profiles.”

where s~i(k)\tilde s^{(k)}_i is the strategy played by player ii in any stage Nash profile. The kk is used to indicate that all players play strategies from the same stage Nash profile.

Using backwards induction we see that this strategy is a Nash equilibrium. Furthermore it is a stage Nash profile so it is a Nash equilibria for the last stage game which is the last subgame. If we consider (in an inductive way) each subsequent subgame the result holds.


Example

Consider the following stage game:

Mr=(201010)Mc=(324120)M_r=\begin{pmatrix} 2&0&1\\ 0&1&0\\ \end{pmatrix} \qquad M_c=\begin{pmatrix} 3&2&4\\ 1&2&0\\ \end{pmatrix}

There are two Nash equilibria in action space for this stage game:

(r1,c3)(r2,c2)(r_1, c_3)\qquad(r_2, c_2)

For T=2T=2 we have 4 possible outcomes that correspond to the outcome of a subgame perfect Nash equilibria:

(r1r1,c3c3) giving utility vector: (2,8)(r_1r_1,c_3c_3)\text{ giving utility vector: }(2,8)
(r1r2,c3c2) giving utility vector: (2,6)(r_1r_2,c_3c_2)\text{ giving utility vector: }(2,6)
(r2r1,c2c3) giving utility vector: (2,6)(r_2r_1,c_2c_3)\text{ giving utility vector: }(2,6)
(r2r2,c2c2) giving utility vector: (2,4)(r_2r_2,c_2c_2)\text{ giving utility vector: }(2,4)

Importantly, not all subgame Nash equilibria outcomes are of the above form.

Reputation

In a repeated game it is possible for players to encode reputation and trust in their strategies.

Example: Reputation in a repeated game

Consider the following stage game with T=2T=2:

A=(061175)B=(031101)A = \begin{pmatrix} 0 & 6 & 1\\ 1 & 7 & 5 \end{pmatrix} \qquad B = \begin{pmatrix} 0 & 3 & 1\\ 1 & 0 & 1 \end{pmatrix}

Through inspection it is possible to verify that the following strategy pair is a Nash equilibrium.

For the row player:

(,)r1(r1,c1)r2(r1,c2)r2(r1,c3)r2(r2,c1)r2(r2,c2)r2(r2,c3)r2\begin{align*} (\emptyset, \emptyset) &\to r_1\\ (r_1, c_1) &\to r_2\\ (r_1, c_2) &\to r_2\\ (r_1, c_3) &\to r_2\\ (r_2, c_1) &\to r_2\\ (r_2, c_2) &\to r_2\\ (r_2, c_3) &\to r_2\\ \end{align*}

For the column player:

(,)c2(r1,c1)c3(r2,c1)c1(r1,c2)c3(r2,c2)c1(r1,c3)c3(r2,c3)c1\begin{align*} (\emptyset, \emptyset) &\to c_2\\ (r_1, c_1) &\to c_3\\ (r_2, c_1) &\to c_1\\ (r_1, c_2) &\to c_3\\ (r_2, c_2) &\to c_1\\ (r_1, c_3) &\to c_3\\ (r_2, c_3) &\to c_1\\ \end{align*}

This pair of strategies corresponds to the following scenario:

The row player plays r1r_1 and the column player plays c2c_2 in the first stage. The row player plays r2r_2 and the column player plays c3c_3 in the second stage.

Note that if the row player deviates and plays r2r_2 in the first stage, then the column player will play c1c_1 in the second stage.

If both players play these strategies, their utilities are: (11,4)(11, 4), which is better for both players than the utilities at any sequence of stage Nash equilibria in action space.

But is this a Nash equilibrium? To find out, we investigate whether either player has an incentive to deviate.

  1. If the row player deviates, they would only be rational to do so in the first stage. If they did, they would gain 1 in that stage but lose 4 in the second stage. Thus they have no incentive to deviate.
  2. If the column player deviates, they would only do so in the first stage and gain no utility.

Thus, this strategy pair is a Nash equilibrium and evidences how reputation can be built and cooperation can emerge from complex dynamics.

Definition: Infinitely Repeated Game with Discounting


Given a two player game (A,B)Rm×n2(A,B)\in\mathbb{R}^{{m\times n}^2}, referred to as a stage game, an infinitely repeated game with discounting factor δ\delta is a game in which players play that stage game an infinite amount of times and gain the following utility:

U(sr,sc)=i=0δiu(sr(i),sc(i))U(s_r, s_c) = \sum_{i=0}^{\infty}\delta ^ i u(s_r(i), s_c(i))

where u(sr(i),sc(i))u(s_r(i), s_c(i)) denotes the utility obtained in the stage game for actions sr(i)s_r(i) and sc(i)s_c(i) which are the actions given by strategies srs_r and scs_c at stage ii.


Example:

Consider Motivating Example: Construction Contractors and assume Firm A plans to undercut at all contracts: bidding low. Firm B plans to start by cooperating (bidding high) but if Firm A ever undercuts then Firm B will undercut for all subsequent contracts.

UA(sr,sc)=i=0δiuA(sr(i),sc(i))=uA(L,H)+i=1δiuA(L,L)=uA(L,H)+uA(L,L)i=1δi=uA(L,H)+uA(L,L)δ1δ\begin{align*} U_A(s_r, s_c) &= \sum_{i=0}^{\infty}\delta ^ i u_A(s_r(i), s_c(i)) \\ &= u_A(L, H) + \sum_{i=1}^{\infty} \delta ^{i} u_A(L, L) \\ &= u_A(L, H) + u_A(L, L)\sum_{i=1}^{\infty} \delta ^{i}\\ &= u_A(L, H) + u_A(L, L) \frac{\delta}{1-\delta} \end{align*}

and

UB(sr,sc)=i=0δiuB(sr(i),sc(i))=uB(L,H)+i=1δiuB(L,L)=uB(L,H)+uB(L,L)i=1δi=uB(L,H)+uB(L,L)δ1δ\begin{align*} U_B(s_r, s_c) &= \sum_{i=0}^{\infty}\delta ^ i u_B(s_r(i), s_c(i)) \\ &= u_B(L, H) + \sum_{i=1}^{\infty} \delta ^{i} u_B(L, L) \\ &= u_B(L, H) + u_B(L, L)\sum_{i=1}^{\infty} \delta ^{i}\\ &= u_B(L, H) + u_B(L, L) \frac{\delta}{1-\delta} \end{align*}

Replacing the values from the stage game this gives:

UA(sr,sc)=5+δ1δUB(sr,sc)=0+δ1δU_A(s_r, s_c)= 5 + \frac{\delta}{1-\delta} \qquad U_B(s_r, s_c)= 0 + \frac{\delta}{1-\delta}

Definition: Average utility

If we interpret δ\delta as the probability of the repeated game not ending then the average length of the game is:

Tˉ=11δ\bar T=\frac{1}{1-\delta}

We can use this to define the average payoff per stage:

Uˉi(r,c)=(1δ)Ui(r,c)\bar U_i(r,c)=(1-\delta)U_i(r,c)

Example:

Consider 3 strategies for Motivating Example: Construction Contractors:

For δ(1/4,3/4)\delta \in (1/4, 3/4):

  1. Obtain the 3 by 3 Normal form game using average utility.
  2. Check if mutual cooperation is a Nash equilibrium for both games.

We construct the Normal Form game representation with the 3 strategies becoming the action space (as described in Mapping Extensive Form Games to Normal Form).

We see that if ScS_c is played against SgS_g then both players cooperate at each stage giving:

Uˉr(Sc,Sc)=Uˉr(Sg,Sg)=Uˉr(Sg,Sc)=Uˉr(Sc,Sg)=(1δ)31δ=3\bar U_r(S_c, S_c)=\bar U_r(S_g, S_g)=\bar U_r(S_g, S_c)=\bar U_r(S_c, S_g)=(1-\delta)\frac{3}{1-\delta}=3

For SdS_d and ScS_c we have:

Uˉr(Sd,Sc)=5\bar U_r(S_d, S_c)=5
Uˉr(Sd,Sd)=1\bar U_r(S_d, S_d)=1
Uˉr(Sc,Sd)=0\bar U_r(S_c, S_d)=0

For SgS_g and SdS_d we repeat the calculations of Example: to obtain:

Uˉr(Sg,Sd)=(1δ)(0+i=1δi)=(1δ)δ1δ=δ\bar U_r(S_g, S_d) = (1 - \delta)\left(0 + \sum_{i=1}^{\infty}\delta ^i\right)=(1-\delta)\frac{\delta}{1-\delta}=\delta

and

Uˉr(Sd,Sg)=(1δ)(5+i=1δi)=(1δ)5+(1δ)δ1δ=54δ\bar U_r(S_d, S_g) = (1 - \delta)\left(5 + \sum_{i=1}^{\infty}\delta ^i\right)=(1-\delta)5 + (1-\delta)\frac{\delta}{1-\delta}=5-4\delta

Using the action space ordered as: {Sc,Sd,Sg}\{S_c, S_d, S_g\} this gives:

Mr=(3035154δ3δ3)M_r = \begin{pmatrix} 3 & 0 & 3\\ 5 & 1 & 5 - 4\delta\\ 3 & \delta & 3\\ \end{pmatrix}

The game is symmetric so:

Mc=MrTM_c=M_r^T

For δ=1/4\delta=1/4 this gives:

Mr=(3035143143)M_r = \begin{pmatrix} 3 & 0 & 3\\ \underline{5} & \underline{1} & \underline{4}\\ 3 & \frac{1}{4} & 3\\ \end{pmatrix}

The game is symmetric so:

Mc=(3530114343)M_c = \begin{pmatrix} 3 & \underline{5} & 3\\ 0 & \underline{1} & \frac{1}{4}\\ 3 & \underline{4} & 3\\ \end{pmatrix}

We see that scs_c and sgs_g are both dominated by sds_d thus mutual cooperation is not a Nash equilibrium.

For δ=3/4\delta=3/4 this gives:

Mr=(3035123343)M_r = \begin{pmatrix} 3 & 0 & \underline{3}\\ \underline{5} & \underline{1} & 2\\ 3 & \frac{3}{4} & \underline{3}\\ \end{pmatrix}

The game is symmetric so:

Mc=(3530134323)M_c = \begin{pmatrix} 3 & \underline{5} & 3\\ 0 & \underline{1} & \frac{3}{4}\\ \underline{3} & 2 & \underline{3}\\ \end{pmatrix}

We see that mutual cooperation now is a Nash equilibrium because sgs_g is now a best response to sgs_g.

The fact that cooperation can be stable for a higher value of δ\delta is in fact a theorem that holds in the general case.

We need one final definition to describe what we imply:

Definition of individually rational payoffs


Individually rational payoffs are average payoffs that exceed the stage game Nash equilibrium payoffs for both players.


As an example consider the plot corresponding to a repeated Prisoner’s Dilemma shown in Figure 1.

A 2 dimension polygon

Figure 1:The convex hull of payoffs for the Prisoner’s Dilemma.

The feasible average payoffs correspond to the feasible payoffs in the stage game. The individually rational payoffs show the payoffs that are better for both players than the stage Nash equilibrium.

The following theorem states that we can choose a particular discount rate that for which there exists a subgame perfect Nash equilibrium that would give any individually rational payoff pair!

Theorem: Folk Theorem


Let (u1,u2)(u_1^*,u_2^*) be a pair of Nash equilibrium payoffs for a stage game. For every individually rational pair (v1,v2)(v_1,v_2) there exists δˉ\bar \delta such that for all 1>δ>δˉ>01>\delta>\bar \delta>0 there is a subgame perfect Nash equilibrium with payoffs (v1,v2)(v_1,v_2).


Proof


Let (σ1,σ2)(\sigma_1^*,\sigma_2^*) be the stage Nash profile that yields (u1,u2)(u_1^*,u_2^*). Now assume that playing σˉ1ΔA1\bar\sigma_1\in\Delta \mathcal{A}_1 and σˉ2ΔA2\bar\sigma_2\in\Delta \mathcal{A}_2 in every stage gives (v1,v2)(v_1,v_2) (an individual rational payoff pair).

Consider the following strategy:

“Begin by using σˉi\bar \sigma_i and continue to use σˉi\bar \sigma_i as long as both players use the agreed strategies. If any player deviates: use σi\sigma_i^* for all future stages.”

We begin by proving that the above is a Nash equilibrium.

Without loss of generality if player 1 deviates to σ1ΔS1\sigma_1'\in\Delta S_1 such that u1(σ1,σˉ2)>v1u_1(\sigma_1',\bar \sigma_2)>v_1 in stage kk then:

U1(k)=t=1k1δt1v1+δk1u1(σ1,σˉ2)+u1(11δt=1kδt1)U_1^{(k)}=\sum_{t=1}^{k-1}\delta^{t-1}v_1+\delta^{k-1}u_1(\sigma_1',\bar \sigma_2)+u_1^*\left(\frac{1}{1-\delta}-\sum_{t=1}^{k}\delta^{t-1}\right)

Recalling that player 1 would receive v1v_1 in every stage with no deviation, the biggest gain to be made from deviating is if player 1 deviates in the first stage (all future gains are more heavily discounted). Thus if we can find δˉ\bar\delta such that δ>δˉ\delta>\bar\delta implies that U1(1)v11δU_1^{(1)}\leq \frac{v_1}{1-\delta} then player 1 has no incentive to deviate.

U1(1)=u1(σ1,σˉ2)+u1δ1δv11δ(1δ)u1(σ1,σˉ2)+u1δv1u1(σ1,σˉ2)v1δ(u1(σ1,σˉ2)u1)\begin{aligned} U_1^{(1)}=u_1(\sigma_1',\bar\sigma_2)+u_1^*\frac{\delta}{1-\delta}&\leq\frac{v_1}{1-\delta}\\ (1-\delta)u_1(\sigma_1',\bar\sigma_2)+u_1^*\delta&\leq v_1\\ u_1(\sigma_1',\bar\sigma_2)-v_1&\leq \delta(u_1(\sigma_1',\bar\sigma_2)-u_1^*)\\ \end{aligned}

as u1(σ1,σˉ2)>v1>u1u_1(\sigma_1',\bar \sigma_2)>v_1>u_1^*, taking δˉ=u1(σ1,σˉ2)v1u1(σ1,σˉ2)u1\bar\delta=\frac{u_1(\sigma_1',\bar\sigma_2)-v_1}{u_1(\sigma_1',\bar\sigma_2)-u_1^*} gives the required required result for player 1 and repeating the argument for player 2 completes the proof of the fact that the prescribed strategy is a Nash equilibrium.

By construction this strategy is also a subgame perfect Nash equilibrium. Given any history both players will act in the same way and no player will have an incentive to deviate:

Definition: Prisoner’s Dilemma

The general form is:

A=(RSTP)B=(RTSP)A = \begin{pmatrix} R & S\\ T & P \end{pmatrix}\qquad B = \begin{pmatrix} R & T\\ S & P \end{pmatrix}

with the following constraints:

T>R>P>S2R>T+ST > R > P > S \qquad 2R > T + S

This game is a good model of agent (human, etc) interaction: a player can choose to take a slight loss of utility for the benefit of the other player and themselves.

Example: Simplifed form of the Prisoner’s Dilemma:

Under what conditions is the following game a Prisoner’s Dilemma:

A=(1μ1+μ0)B=ATA = \begin{pmatrix} 1 & -\mu \\ 1 + \mu & 0 \end{pmatrix}\qquad B = A^T

This is a Prisoner’s Dilemma when:

1+μ>1 and 0>μ1 + \mu > 1 \text{ and } 0 > -\mu

and:

2>12 > 1

Both of these equations hold for μ>0\mu>0. This is a convenient form for the Prisoner’s Dilemma as it corresponds to a 1-dimensional parametrization.

As a single one-shot game there is not much more to say about the Prisoner’s Dilemma. It becomes fascinating when studied as a repeated game.

Axelrod’s tournaments


In 1980, political scientist Robert Axelrod organized a computer tournament for an iterated Prisoner’s Dilemma, as described in Axelrod, 1980.

Following this, Axelrod hosted a second tournament with 62 strategy submissions Axelrod, 1980, again including a random strategy. Participants in this second round were aware of the original results, and many strategies were designed specifically to counter or improve upon Tit For Tat. Nonetheless, Tit For Tat prevailed once again.

These tournaments led to the wide acceptance of four principles for effective cooperation:

These early experiments with the repeated Prisoner’s Dilemma were not only influential in political science but also laid a foundation for the study of emergent cooperation in multi-agent systems.

Exercises

Exercise: Constructing non-stage-equilibrium repeated outcomes


Recalling that a subgame perfect equilibrium for a finitely repeated game must involve a stage Nash equilibrium in the final stage, attempt to identify a subgame perfect equilibrium for the repeated game that is not a simple repetition of stage Nash profiles for each of the following games.

Game 1

Let

Mr=(4714),Mc=(3613)M_r = \begin{pmatrix} 4 & 7\\ 1 & 4 \end{pmatrix},\qquad M_c = \begin{pmatrix} 3 & 6\\ 1 & 3 \end{pmatrix}

Game 2

Let

Mr=(500130),Mc=(433463)M_r = \begin{pmatrix} 5 & 0\\ 0 & 1\\ 3 & 0 \end{pmatrix},\qquad M_c = \begin{pmatrix} 4 & 3\\ 3 & 4\\ 6 & 3 \end{pmatrix}

Game 3

Let

Mr=(101110),Mc=(231011)M_r = \begin{pmatrix} 1 & 0 & -1\\ -1 & -1 & 0 \end{pmatrix},\qquad M_c = \begin{pmatrix} 2 & 3 & 1\\ 0 & -1 & 1 \end{pmatrix}

Exercise: Size of action space

For a general stage game with (Mr,Mc)R(m×n)2(M_r, M_c) \in \mathbb{R}^{(m\times n)^2} identify the size of the action space for the repeated game for each player when:

Exercise: Identifying Prisoners Dilemmas


Justify whether or not the following games are instances of the Prisoners Dilemma.

Game 1

A=(3051)B=(3501)A = \begin{pmatrix} 3 & 0\\ 5 & 1 \end{pmatrix} \qquad B = \begin{pmatrix} 3 & 5\\ 0 & 1 \end{pmatrix}

Game 2

A=(1120)B=(1210)A = \begin{pmatrix} 1 & -1\\ 2 & 0 \end{pmatrix} \qquad B = \begin{pmatrix} 1 & 2\\ -1 & 0 \end{pmatrix}

Game 3

A=(1120)B=(3501)A = \begin{pmatrix} 1 & -1\\ 2 & 0 \end{pmatrix} \qquad B = \begin{pmatrix} 3 & 5\\ 0 & 1 \end{pmatrix}

Game 4

A=(60121)B=(61200)A = \begin{pmatrix} 6 & 0\\ 12 & 1 \end{pmatrix} \qquad B = \begin{pmatrix} 6 & 12\\ 0 & 0 \end{pmatrix}

Exercise: Repeated strategies in the Prisoners Dilemma


Consider the standard Prisoners Dilemma:

Mr=(3051)Mc=(3501)M_r = \begin{pmatrix} 3 & 0\\ 5 & 1 \end{pmatrix} \qquad M_c = \begin{pmatrix} 3 & 5\\ 0 & 1 \end{pmatrix}

Suppose players repeatedly play this game using one of the following strategies:

  1. Tit For Tat: starts by cooperating, then repeats the opponent’s previous action.
  2. Alternator: starts by cooperating, then alternates between cooperation and defection.
  3. 75% cooperator: a random strategy that cooperates 75% of the time.

Obtain the normal form representation of the repeated game for each of the following scenarios:

For each case, determine the Nash equilibria in action space.


Exercise: Payoffs and equilibrium in the infinite game


Consider the following stage game:

Mr=(1322)Mc=(1762)M_r = \begin{pmatrix} -1 & 3\\ -2 & 2 \end{pmatrix} \qquad M_c = \begin{pmatrix} 1 & -7\\ 6 & 2 \end{pmatrix}
  1. For δ=13\delta = \frac{1}{3}, compute the average payoffs for the strategies SDS_D: “always play the first row” and SCS_C: “always play the second row”.

  2. Plot the feasible average payoff space and the individually rational payoff space.

  3. Using the Folk Theorem, determine whether there exists a value of δ\delta that supports a subgame perfect equilibrium with average payoffs:

    • (3/2,3/2)(3/2, 3/2)
    • (0,3)(0, 3)
    • (2,6)(2, 6)
    • (2,0)(2, 0)

Programming

Using Nashpy to generate repeated games

The Nashpy library has support for generating repeated games. Let us generate the repeated game obtained from the Prisoners Dilemma with T=2T=2:

import nashpy.repeated_games

import nashpy as nash
import numpy as np

M_r = np.array([[3, 0], [5, 1]])
M_c = M_r.T
prisoners_dilemma = nash.Game(M_r, M_c)
repeated_pd = nash.repeated_games.obtain_repeated_game(game=prisoners_dilemma, repetitions=2)
repeated_pd

We can directly obtain the action space of the row player for a given repeated game as a python generator:

strategies = nash.repeated_games.obtain_strategy_space(A=M_r, repetitions=2)
len(list(strategies))

To obtain the action space for the columbn player use A=M_c.T:

strategies = nash.repeated_games.obtain_strategy_space(A=M_c.T, repetitions=2)
len(list(strategies))

Using the Axelrod library to study the repeated Prisoners Dilemma

There is a python library called axelrod which has functionality to run tournaments akin to Robert Axelrod’s original tournaments but also includes more than 240 total strategies as well as further functionality.

To generate a tournament with 4 strategies with δ=1/4\delta=1/4

  1. Tit For Tat: starts by cooperating, then repeats the opponent’s previous action.
  2. Alternator: starts by cooperating, then alternates between cooperation and defection.
  3. 75% cooperator: a random strategy that cooperates 75% of the time.
  4. Grudger: starts by cooperating, then defects forever if the opponent ever defects against it.

First let us set up some parameters such as the players as well as the seed to ensure reproducibility of the random results.

import axelrod as axl

players = [axl.TitForTat(), axl.Alternator(), axl.Random(.75), axl.Grudger()]
delta = 1 / 4
seed = 0
repetitions = 500
prob_end = 1 - delta

Now we create and run the tournament obtaining the payoff matrix from the results:

tournament = axl.Tournament(players=players, prob_end=prob_end, seed=seed, repetitions=repetitions)
results = tournament.play(progress_bar=False)
M_r = np.array(results.payoff_matrix)
M_r

Notable research

The most influential early research on repeated games comes from the two seminal papers by Robert Axelrod Axelrod, 1980Axelrod, 1980, which culminated in his widely cited book Axelrod, 1984. As discussed in Section 6, Axelrod’s findings included a set of heuristics for successful behavior in the repeated Prisoner’s Dilemma.

Subsequent work has challenged these principles. The development of the Axelrod Python library Knight et al., 2016 enabled systematic and reproducible analysis across a far greater population of strategies. In particular, Glynatsi et al., 2024 studied 45,600 tournaments across diverse strategic populations and parameter regimes, finding that the original heuristics do not generalize. The revised empirical findings suggest the following properties are associated with successful play:

The observation that being envious can be beneficial echoes a more recent development that was famously described by MIT Technology Review as having set “the world of game theory on fire”. In their landmark paper, Press & Dyson, 2012 introduced a novel class of memory-one strategies called zero-determinant strategies, which can unilaterally enforce linear payoff relationships, including extortionate dynamics.

However, the excitement around these strategies has been tempered by further theoretical and computational research. Studies such as Hilbe et al., 2013Knight et al., 2018 demonstrate that extortionate strategies, while elegant, do not tend to survive under evolutionary dynamics. These results reinforce the importance of adaptability and contextual behavior in long-run strategic success.

The Folk Theorem presented in this chapter can be traced to foundational work by Friedman, 1971 (with a correction in Friedman, 1973) and the general formulation in Fudenberg & Maskin, 1986. These theoretical insights laid the groundwork for subsequent developments such as Young, 1993, which shows how social conventions and norms can emerge from adaptive play in repeated settings.

Conclusion

Repeated games introduce a rich strategic landscape where history matters and reputation can sustain cooperation. Unlike one-shot interactions, repeated games allow for long-term incentives to shape behavior, often leading to outcomes that dominate those of stage game equilibria.

This chapter explored key ideas including the formal definitions of repeated and infinitely repeated games, strategy spaces based on full histories, subgame perfection through sequences of Nash profiles, and the pivotal role of discounting. Through examples and the Folk Theorem, we saw how cooperation can emerge—even in environments like the Prisoner’s Dilemma—when players value the future sufficiently.

In addition to theoretical foundations, we examined computational tools such as the Nashpy and Axelrod libraries, which allow for large-scale simulations and reproducible research into emergent cooperative behavior. We also revisited classical findings, including Axelrod’s tournaments, and contrasted them with modern insights that challenge early heuristics. A summary of key concepts is given in Table 1.

Table 1:The main linear programs for Zero Sum Game

ConceptDescription
Repeated gameA game consisting of multiple repetitions of a fixed stage game
Strategy in repeated gameA mapping from history of play to actions
Subgame perfect equilibriumAn equilibrium where players play a Nash profile in every subgame
ReputationPlayers may cooperate to sustain credibility and trust over time
Discount factor (δ\delta)Models time preference or probability of continuation
Average utilityWeighted utility that accounts for δ\delta as average continuation length
Folk TheoremAny individually rational payoff can be sustained given δ\delta is high enough
Prisoner’s DilemmaA canonical stage game often used to model cooperation and defection
Zero-determinant strategiesMemory-one strategies capable of extorting opponents (not evolutionarily stable)
Axelrod’s tournamentsEarly empirical work showing Tit for Tat’s success in repeated dilemmas

References
  1. Axelrod, R. (1980). Effective choice in the prisoner’s dilemma. Journal of Conflict Resolution, 24(1), 3–25.
  2. Axelrod, R. (1980). More effective choice in the prisoner’s dilemma. Journal of Conflict Resolution, 24(3), 379–403.
  3. Axelrod, R. (1984). The Evolution of Cooperation. Basic Books.
  4. Knight, V. A., Campbell, O., Harper, M., Langner, K. M., Campbell, J., Campbell, T., Carney, A., Chorley, M., Davidson-Pilon, C., Glass, K., & others. (2016). An open framework for the reproducible study of the iterated prisoner’s dilemma. Journal of Open Research Software, 4(1).
  5. Glynatsi, N. E., Knight, V., & Harper, M. (2024). Properties of winning Iterated Prisoner’s Dilemma strategies. PLOS Computational Biology, 20(12), e1012644.
  6. Press, W. H., & Dyson, F. J. (2012). Iterated Prisoner’s Dilemma contains strategies that dominate any evolutionary opponent. Proceedings of the National Academy of Sciences, 109(26), 10409–10413.
  7. Hilbe, C., Nowak, M. A., & Sigmund, K. (2013). Evolution of extortion in iterated prisoner’s dilemma games. Proceedings of the National Academy of Sciences, 110(17), 6913–6918.
  8. Knight, V., Harper, M., Glynatsi, N. E., & Campbell, O. (2018). Evolution reinforces cooperation with the emergence of self-recognition mechanisms: An empirical study of strategies in the Moran process for the iterated prisoner’s dilemma. PloS One, 13(10), e0204981.
  9. Friedman, J. W. (1971). A non-cooperative equilibrium for supergames. The Review of Economic Studies, 38(1), 1–12.
  10. Friedman, J. W. (1973). A non-cooperative equilibrium for supergames: A correction. The Review of Economic Studies, 40(3), 435–435.
  11. Fudenberg, D., & Maskin, E. (1986). The folk theorem in repeated games with discounting or with incomplete information. Econometrica, 54(3), 533–554.
  12. Young, H. P. (1993). The evolution of conventions. Econometrica: Journal of the Econometric Society, 57–84.