Given a two player game $(A,B)\in{\mathbb{R}^{m\times n}}^2$, referred to as a **stage** game, a $T$-stage repeated game is a game in which players play that stage game for $T > 0$ periods. Players make decisions based on the full history of play over all the periods.

For example consider the game:

$$ A = \begin{pmatrix} 0 & 6 & 1\\ 1 & 7 & 5 \end{pmatrix} \qquad B = \begin{pmatrix} 0 & 3 & 1\\ 1 & 0 & 1 \end{pmatrix} $$by identifying the best responses:

$$ A = \begin{pmatrix} 0 & 6 & 1\\ \underline{1} & \underline{7} & \underline{5} \end{pmatrix} \qquad B = \begin{pmatrix} 0 & \underline{3} & 1\\ \underline{1} & 0 & \underline{1} \end{pmatrix} $$it is immediate to find two Nash equilibria:

$$((0,1), (1,0,0))\qquad((0,1), (0,0,1))$$In [1]:

```
import nashpy as nash
import numpy as np
A = np.array([[0, 6, 1],
[1, 7, 5]])
B = np.array([[0, 3, 1],
[1, 0, 1]])
game = nash.Game(A, B)
list(game.support_enumeration())
```

Out[1]:

If we were to repeat this game twice ($T=2$) we obtain a new game. However to be able to think of this we need to define what a strategy in a repeated game is.

Given a two player stage game $(A,B)\in{\mathbb{R}^{m\times n}}^2$, repeated to give a $T$-stage repeated game. A strategy is a mapping from the entire history of play to an action of the stage game:

$$ \bigcup_{t=0}^{T-1}H(t)\to a $$where:

- $H(t)$ is the history of play of
**both**players up until stage $t$ ($H(0)=(\emptyset, \emptyset)$) - $a$ is an action (for either player) of the
**stage**game

To help avoid confusion, whilst we have referred to pure strategies as choices made in stage games, here we will call those **actions**.

The actions for our example:

- for the row player: $\{r_1, r_2\}$ (corresponding to the rows)
- for the column player: $\{c_1, c_2, c_3\}$ (corresponding to the columns)

A strategy for the row/column player thus needs to map an element of the following set to an element of $\{r_1, r_2\}$/$\{c_1, c_2, c_3\}$:

$$\bigcup_{t=0}^{1}H(t)=\{(\emptyset, \emptyset), (r_1, c_1), (r_1, c_2), (r_1, c_3), (r_2, c_1), (r_2, c_2), (r_3, c_3)\}$$In other words, in our example, a strategy answers both of the following questions:

- What should the player do in the first period?
- What should the player do in the second period
**given**knowledge of what both players did in the first period?

The following theorem allows us to find **a** Nash equilibrium:

For any repeated game, any sequence of stage Nash profiles gives a Nash equilibrium.

Consider the following strategy:

The row/column player should play action $a_{r/c}$ regardless of the play of any previous strategy profiles.

where $(a_{r}, a_{c})$ is a given stage Nash equilibrium.

Using backwards induction, this is a Nash equilibrium for the last stage game. Thus, at the last stage, no player has a reason to deviate. Similarly at the $T-1$th stage. The proof follows.

Thus, for our example we have the four Nash equilibria:

- $(r_2r_2, c_1c_1)$ with utility: (2, 2).
- $(r_2r_2, c_1c_3)$ with utility: (6, 2).
- $(r_2r_2, c_3c_1)$ with utility: (6, 2).
- $(r_2r_2, c_3c_3)$ with utility: (10, 2).

Note however that it is not the only equilibria for our repeated game.

In a repeated game it is possible for players to encode reputation and trust in their strategies.

Consider the following two strategies:

For the row player:

$$(\emptyset, \emptyset) \to r_1$$ $$(r_1, c_1) \to r_2$$ $$(r_1, c_2) \to r_2$$ $$(r_1, c_3) \to r_2$$

For the column player:

$$(\emptyset, \emptyset) \to c_2$$ $$(r_1, c_2) \to c_3$$ $$(r_2, c_2) \to c_1$$

Note that here we omit some of the histories which are not possible based on the first play by each player.

This strategy corresponds to the following scenario:

Play $(r_1,c_2)$ in first stage and $(r_2,c_3)$ in second stage unless the row player does not cooperate in which case play $(r_2, c_1)$.

If both players play these strategies their utilities are: $(11, 4)$ which is better **for both players** then the utilities at any sequence of pure stage Nash equilibria. **But** is this a Nash equilibrium? To find out we investigate if either player has an incentive to deviate.

- If the row player deviates, they would only be rational to do so in the first stage, if they did they would gain 1 in that stage but lose 4 in the second stage. Thus they have no incentive to deviate.
- If the column player deviates, they would only do so in the first stage and gain no utility.

Thus this strategy pair **is a Nash equilibrium** and evidences how a reputation can be built and cooperation can emerge from complex dynamics.

Source code: @drvinceknight Powered by: Python Mathjax Github pages Skeleton css