foundations of computational agents
Consider the game of Tic-Tac-Toe (also called Noughts and Crosses), which is played by two players, an “X” player and an “O” player who alternate putting their symbol in a blank space on a $3\times 3$ game-board. A player’s goal is to win by placing three symbols in a row, column, or diagonal; the game ends when a player wins or the board is filled. In the game shown below, player O has just made its third turn. It is Xâs turn to make its fourth move. The playing agent needs to decide intelligently which of the available 3 moves X should choose next: X1, X2, or X3. We have started the search tree, with three branches for the three possible moves for X:
Draw the rest of the game tree. Assume the value of the tree is +1 for an X win, $-1$ for an O win, and 0 for a draw. Show how the values are backed up to give a value for each of the nodes. What should $X$ do? Is it likely to win, lose or draw?
Could $\alpha $-$\beta $ pruning prune any of the tree?
For the hawk–dove game of Example 11.11, where $D>0$ and $R>0$, each agent is trying to maximize its utility. Is there a Nash equilibrium with a randomized strategy? What are the probabilities? What is the expected payoff to each agent? (These should be expressed as functions of $R$ and $D$). Show your calculations.
Which of the following games in normal form have a Nash equilibrium made up of pure strategies? For those that do, specify the pure strategy Nash equilibria. For those that do not, explain how you know there is no pure strategy Nash equilibrium.
Player 2 | |||
---|---|---|---|
a2 | b2 | ||
Player 1 | a1 | 10,10 | 110,0 |
b1 | 0,110 | 100,100 |
Player 2 | |||
---|---|---|---|
a2 | b2 | ||
Player 1 | a1 | 10,10 | 11,20 |
b1 | 0,11 | 20,1 |
Player 2 | |||
---|---|---|---|
a2 | b2 | ||
Player 1 | a1 | 10,20 | 5,10 |
b1 | 7,11 | 20,12 |
In Example 11.12, what is the Nash equilibrium with randomized strategies? What is the expected value for each agent in this equilibrium?
Consider the following normal-form game where the row player can choose action $A$, $B$ or $C$ and the column player could choose action $D$, $E$, or $F$:
$\begin{array}{cccc}& \hfill D\hfill & \hfill E\hfill & \hfill F\hfill \\ A\hfill & \hfill 40,40\hfill & \hfill 120,10\hfill & \hfill 60,30\hfill \\ B\hfill & \hfill 30,60\hfill & \hfill 110,60\hfill & \hfill 90,90\hfill \\ C\hfill & \hfill 30,110\hfill & \hfill 100,100\hfill & \hfill 70,120\hfill \end{array}$
where the pairs give the value of the outcome for the row player followed by the value for the column player.
When eliminating dominant strategies, what strategies (if any) can be eliminated? Explain what is eliminated, and what cannot be eliminated.
Specify a Nash equilibrium for this game. (For a randomized strategy, give just the actions that are randomized; you do not need to give the probabilities). Explain why it is a Nash equilibrium.
Is there more than one Nash equilibrium? If so, give another one. If not explain why there is no other one.
If the agents could coordinate, could they get a better outcome than in a Nash equilibrium? Explain why or why not.
Answer the same questions as in the previous exercise for the following games:
$\begin{array}{cccc}& \hfill D\hfill & \hfill E\hfill & \hfill F\hfill \\ A\hfill & \hfill 2,11\hfill & \hfill 10,10\hfill & \hfill 3,12\hfill \\ B\hfill & \hfill 5,7\hfill & \hfill 12,1\hfill & \hfill 6,5\hfill \\ C\hfill & \hfill 6,5\hfill & \hfill 13,2\hfill & \hfill 4,6\hfill \end{array}$
$\begin{array}{cccc}& \hfill D\hfill & \hfill E\hfill & \hfill F\hfill \\ A\hfill & \hfill 80,130\hfill & \hfill 20,10\hfill & \hfill 130,80\hfill \\ B\hfill & \hfill 130,80\hfill & \hfill 30,20\hfill & \hfill 80,130\hfill \\ C\hfill & \hfill 20,10\hfill & \hfill 100,100\hfill & \hfill 30,20\hfill \end{array}$
Consider the sequential prisoner’s dilemma.
Suppose the agents play for a fixed number of times (say three times). Give two equilibria if there are two or more, otherwise give the unique equilibrium and explain why there is only one. Hint: Consider the last time first.
Suppose there is a discount factor of $\gamma $, which means there is a probability $\gamma $ of stopping at each stage. Is tit-for-tat a Nash equilibrium for all values of $\gamma $? If so, prove it. If not, for which values of $\gamma $ is it a Nash equilibrium?