Third edition of Artificial Intelligence: foundations of computational agents, Cambridge University Press, 2023 is now available (including the full text).

10.4.1.2 Computing Randomized Strategies

We can use the fact that an agent will only randomize between actions if the actions all have the same utility to the agent (given the strategies of the other agent). This forms a set of constraints that can be solved to give a Nash equilibrium. If these constraints can be solved with numbers in the range (0,1), and the mixed strategies computed for each agent are not dominated by another strategy for the agent, then this strategy profile is a Nash equilibrium.

Recall that a support set is a set of pure strategies that each have non-zero probability in a Nash equilibrium.

Once dominated strategies have been eliminated, we can search over support sets to determine whether the support sets form a Nash equilibrium. Note that, if there are n actions available to an agent, there are 2n-1 non-empty subsets, and we have to search over combinations of support sets for the various agents. So this is not very feasible unless there are few non-dominated actions or there are Nash equilibria with small support sets. To find simple (in terms of the number of actions in the support set) equilibria, we can search from smaller support sets to larger sets.

Suppose agent i is randomizing between actions ai1,...,aiki in a Nash equilibrium. Let pij be the probability that agent i does action aij. Let σ-i(p-i) be the strategies for the other agents as a function of their probabilities. The fact that this is a Nash equilibrium gives the following constraints: pij>0, j=1kipij=1, and, for all j, j'

utility(aijσ-i(p-i),i) = utility(aij'σ-i(p-i),i).

We also require that the utility of doing aij is not less than the utility of doing an action outside of the support set. Thus, for all a' ∉{ai1,...,aiki},

utility(aijσ-i(p-i),i) ≥ utility(a'σ-i(p-i),i).
Example 10.16: In Example 10.9, suppose the goalkeeper jumps right with probability pj and the kicker kicks right with probability pk.

If the goalkeeper jumps right, the probability of a goal is

0.9pk+0.2(1-pk).

If the goalkeeper jumps left, the probability of a goal is

0.3pk+0.6(1-pk).

The only time the goalkeeper would randomize is if these are equal; that is, if

0.9pk+0.2(1-pk)=0.3pk+0.6(1-pk).

Solving for pk gives pk=0.4. Similarly, for the kicker to randomize, the probability of a goal must be the same whether the kicker kicks left or right:

0.2pj + 0.6(1-pj) = 0.9pj+0.3(1-pj).

Solving for pj gives pj= 0.3. Thus, the only Nash equilibrium is where pk=0.4 and pj= 0.3.