Artificial Intelligence - foundations of computational agents -- 10.4.1.2 Computing Randomized Strategies

Third edition of Artificial Intelligence: foundations of computational agents, Cambridge University Press, 2023 is now available (including the full text).

10.4.1.2 Computing Randomized Strategies

We can use the fact that an agent will only randomize between actions if the actions all have the same utility to the agent (given the strategies of the other agent). This forms a set of constraints that can be solved to give a Nash equilibrium. If these constraints can be solved with numbers in the range (0,1), and the mixed strategies computed for each agent are not dominated by another strategy for the agent, then this strategy profile is a Nash equilibrium.

Recall that a support set is a set of pure strategies that each have non-zero probability in a Nash equilibrium.

Once dominated strategies have been eliminated, we can search over support sets to determine whether the support sets form a Nash equilibrium. Note that, if there are n actions available to an agent, there are 2ⁿ-1 non-empty subsets, and we have to search over combinations of support sets for the various agents. So this is not very feasible unless there are few non-dominated actions or there are Nash equilibria with small support sets. To find simple (in terms of the number of actions in the support set) equilibria, we can search from smaller support sets to larger sets.

Suppose agent i is randomizing between actions a_i¹,...,a_i^k_i in a Nash equilibrium. Let p_i^j be the probability that agent i does action a_i^j. Let σ_-i(p_-i) be the strategies for the other agents as a function of their probabilities. The fact that this is a Nash equilibrium gives the following constraints: p_i^j>0, ∑_j=1^k_ip_i^j=1, and, for all j, j'

utility(a_i^jσ_-i(p_-i),i) = utility(a_i^j'σ_-i(p_-i),i).

We also require that the utility of doing a_i^j is not less than the utility of doing an action outside of the support set. Thus, for all a' ∉{a_i¹,...,a_i^k_i},

utility(a_i^jσ_-i(p_-i),i) ≥ utility(a'σ_-i(p_-i),i).

Example 10.16: In Example 10.9, suppose the goalkeeper jumps right with probability p_j and the kicker kicks right with probability p_k.

If the goalkeeper jumps right, the probability of a goal is

0.9p_k+0.2(1-p_k).

If the goalkeeper jumps left, the probability of a goal is

0.3p_k+0.6(1-p_k).

The only time the goalkeeper would randomize is if these are equal; that is, if

0.9p_k+0.2(1-p_k)=0.3p_k+0.6(1-p_k).

Solving for p_k gives p_k=0.4. Similarly, for the kicker to randomize, the probability of a goal must be the same whether the kicker kicks left or right:

0.2p_j + 0.6(1-p_j) = 0.9p_j+0.3(1-p_j).

Solving for p_j gives p_j= 0.3. Thus, the only Nash equilibrium is where p_k=0.4 and p_j= 0.3.