11.4 Reasoning with Imperfect Information

The third edition of Artificial Intelligence: foundations of computational agents, Cambridge University Press, 2023 is now available (including full text).

11.4.1 Computing Nash Equilibria

To compute a Nash equilibrium for a game in normal form, there are three steps:

  1. 1.

    Eliminate dominated strategies.

  2. 2.

    Determine support set, the set of actions which have non-zero probabilities.

  3. 3.

    Determine the probability for the actions in the support set.

It turns out that the second of these is the most difficult.

Eliminating Dominated Strategies

A strategy s1 for a agent A dominates strategy s2 for A if, for every action of the other agents, the utility of s1 for agent A is higher than the utility of s2 for agent A. This is formalized below. Any pure strategy dominated by another strategy can be eliminated from consideration. The dominating strategy could be a randomized strategy. Removing dominated strategies can be done repeatedly.

Example 11.15.

Consider the following payoff matrix, where the first agent chooses the row and the second agent chooses the column. In each cell is a pair of payoffs: the payoff for Agent 1 and the payoff for Agent 2. Agent 1 has actions {a1,b1,c1}. Agent 2 has actions {d2,e2,f2}.

Agent 2
d2 e2 f2
a1 3,5 5,1 1,2
Agent 1 b1 1,1 2,9 6,4
c1 2,6 4,7 0,8

(Before looking at the solution try to work out what each agent should do.)

Action c1 can be removed because it is dominated by action a1: Agent 1 will never do c1 if action a1 is available to it. Notice how the payoff for Agent 1 is greater doing a1 than doing c1, no matter what the other agent does.

Once action c1 is eliminated, action f2 can be eliminated because it is dominated for Agent 2 by the randomized strategy 0.5*d2+0.5*e2.

Once c1 and f2 have been eliminated, b1 is dominated by a1, and so Agent 1 will do action a1. Given that Agent 1 will do a1, Agent 2 will do d2. Thus, the payoff in this game will be 3 for Agent 1 and 5 for Agent 2.

Strategy s1 strictly dominates strategy s2 for Agent i if, for all action profiles σ-i of the other agents,

utility(s1σ-i,i)>utility(s2σ-i,i)

in which case s2 is strictly dominated by s1. If s2 is a pure strategy that is strictly dominated by some strategy s1, then s2 can never be in the support set of any Nash equilibrium. This holds even if s1 is a stochastic strategy. Repeated elimination of strictly dominated strategies gives the same result, regardless of the order in which the strictly dominated strategies are removed.

There are also weaker notions of domination, where the greater than symbol in the preceding formula is replaced by greater than or equal. If the weaker notion of domination is used, there is always a Nash equilibrium with support of the non-dominated strategies. However, some Nash equilibria may be lost. Moreover, which equilibria are lost can depend on the order in which the dominated strategies are removed.

Computing Randomized Strategies

An agent will only randomize between actions if the actions all have the same utility to the agent, given the strategies of the other agents. This idea leads to a set of constraints that can be solved to compute a Nash equilibrium. If these constraints can be solved with numbers in the range (0,1), and the mixed strategies computed for each agent are not dominated by another strategy for the agent, then this strategy profile is a Nash equilibrium.

Recall that a support set is a set of pure strategies that each have non-zero probability in a Nash equilibrium.

Once dominated strategies have been eliminated, an agent can search over support sets to determine whether the support sets form a Nash equilibrium. Note that, if there are n actions available to an agent, there are 2n-1 non-empty subsets, and we have to search over combinations of support sets for the various agents. This is not feasible unless there are only a few non-dominated actions or there are Nash equilibria with small support sets. To find simple (in terms of the number of actions in the support set) equilibria, an agent can search from smaller support sets to larger sets.

Suppose agent i is randomizing between actions ai1,,aiki in a Nash equilibrium. Let pij be the probability that agent i does action aij. Let σ-i be the strategies for the other agents, which is a function of their probabilities. The fact that this is a Nash equilibrium gives the following constraints: pij>0, j=1kipij=1, and, for all j,j

utility(aijσ-i,i)=utility(aijσ-i,i).

We also require that the utility of doing aij is not less than the utility of doing an action outside of the support set. Thus, for all a{ai1,,aiki},

utility(aijσ-i,i)utility(aσ-i,i).
Example 11.16.

In Example 11.9, suppose the goalkeeper jumps right with probability pg and the kicker kicks right with probability pk.

If the goalkeeper jumps right, the probability of a goal is

0.9pk+0.2(1-pk).

If the goalkeeper jumps left, the probability of a goal is

0.3pk+0.6(1-pk).

The only time the goalkeeper would randomize is if these are equal; that is, if

0.9pk+0.2(1-pk)=0.3pk+0.6(1-pk).

Solving for pk gives pk=0.4.

Similarly, for the kicker to randomize, the probability of a goal must be the same whether the kicker kicks left or right:

0.2pg+0.6(1-pg)=0.9pg+0.3(1-pg).

Solving for pg gives pg=0.3.

Thus, the only Nash equilibrium has pk=0.4 and pg=0.3.