foundations of computational agents
The third edition of Artificial Intelligence: foundations of computational agents, Cambridge University Press, 2023 is now available (including full text).
To compute a Nash equilibrium for a game in normal form, there are three steps:
Eliminate dominated strategies.
Determine support set, the set of actions which have non-zero probabilities.
Determine the probability for the actions in the support set.
It turns out that the second of these is the most difficult.
A strategy ${s}_{1}$ for a agent $A$ dominates strategy ${s}_{2}$ for $A$ if, for every action of the other agents, the utility of ${s}_{1}$ for agent $A$ is higher than the utility of ${s}_{2}$ for agent $A$. This is formalized below. Any pure strategy dominated by another strategy can be eliminated from consideration. The dominating strategy could be a randomized strategy. Removing dominated strategies can be done repeatedly.
Consider the following payoff matrix, where the first agent chooses the row and the second agent chooses the column. In each cell is a pair of payoffs: the payoff for Agent 1 and the payoff for Agent 2. Agent 1 has actions ${\mathrm{\{}}{{a}}_{{\mathrm{1}}}{\mathrm{,}}{{b}}_{{\mathrm{1}}}{\mathrm{,}}{{c}}_{{\mathrm{1}}}{\mathrm{\}}}$. Agent 2 has actions ${\mathrm{\{}}{{d}}_{{\mathrm{2}}}{\mathrm{,}}{{e}}_{{\mathrm{2}}}{\mathrm{,}}{{f}}_{{\mathrm{2}}}{\mathrm{\}}}$.
Agent 2 | ||||
---|---|---|---|---|
${{d}}_{{2}}$ | ${{e}}_{{2}}$ | ${{f}}_{{2}}$ | ||
${{a}}_{{1}}$ | ${3}{,}{5}$ | ${5}{,}{1}$ | ${1}{,}{2}$ | |
Agent 1 | ${{b}}_{{1}}$ | ${1}{,}{1}$ | ${2}{,}{9}$ | ${6}{,}{4}$ |
${{c}}_{{1}}$ | ${2}{,}{6}$ | ${4}{,}{7}$ | ${0}{,}{8}$ |
(Before looking at the solution try to work out what each agent should do.)
Action ${{c}}_{{\mathrm{1}}}$ can be removed because it is dominated by action ${{a}}_{{\mathrm{1}}}$: Agent 1 will never do ${{c}}_{{\mathrm{1}}}$ if action ${{a}}_{{\mathrm{1}}}$ is available to it. Notice how the payoff for Agent 1 is greater doing ${{a}}_{{\mathrm{1}}}$ than doing ${{c}}_{{\mathrm{1}}}$, no matter what the other agent does.
Once action ${{c}}_{{\mathrm{1}}}$ is eliminated, action ${{f}}_{{\mathrm{2}}}$ can be eliminated because it is dominated for Agent 2 by the randomized strategy ${\mathrm{0.5}}{\mathrm{*}}{{d}}_{{\mathrm{2}}}{\mathrm{+}}{\mathrm{0.5}}{\mathrm{*}}{{e}}_{{\mathrm{2}}}$.
Once ${{c}}_{{\mathrm{1}}}$ and ${{f}}_{{\mathrm{2}}}$ have been eliminated, ${{b}}_{{\mathrm{1}}}$ is dominated by ${{a}}_{{\mathrm{1}}}$, and so Agent 1 will do action ${{a}}_{{\mathrm{1}}}$. Given that Agent 1 will do ${{a}}_{{\mathrm{1}}}$, Agent 2 will do ${{d}}_{{\mathrm{2}}}$. Thus, the payoff in this game will be 3 for Agent 1 and 5 for Agent 2.
Strategy ${s}_{1}$ strictly dominates strategy ${s}_{2}$ for Agent $i$ if, for all action profiles ${\sigma}_{-i}$ of the other agents,
$$utility({s}_{1}{\sigma}_{-i},i)>utility({s}_{2}{\sigma}_{-i},i)$$ |
in which case ${s}_{2}$ is strictly dominated by ${s}_{1}$. If ${s}_{2}$ is a pure strategy that is strictly dominated by some strategy ${s}_{1}$, then ${s}_{2}$ can never be in the support set of any Nash equilibrium. This holds even if ${s}_{1}$ is a stochastic strategy. Repeated elimination of strictly dominated strategies gives the same result, regardless of the order in which the strictly dominated strategies are removed.
There are also weaker notions of domination, where the greater than symbol in the preceding formula is replaced by greater than or equal. If the weaker notion of domination is used, there is always a Nash equilibrium with support of the non-dominated strategies. However, some Nash equilibria may be lost. Moreover, which equilibria are lost can depend on the order in which the dominated strategies are removed.
An agent will only randomize between actions if the actions all have the same utility to the agent, given the strategies of the other agents. This idea leads to a set of constraints that can be solved to compute a Nash equilibrium. If these constraints can be solved with numbers in the range $(0,1)$, and the mixed strategies computed for each agent are not dominated by another strategy for the agent, then this strategy profile is a Nash equilibrium.
Recall that a support set is a set of pure strategies that each have non-zero probability in a Nash equilibrium.
Once dominated strategies have been eliminated, an agent can search over support sets to determine whether the support sets form a Nash equilibrium. Note that, if there are $n$ actions available to an agent, there are ${2}^{n}-1$ non-empty subsets, and we have to search over combinations of support sets for the various agents. This is not feasible unless there are only a few non-dominated actions or there are Nash equilibria with small support sets. To find simple (in terms of the number of actions in the support set) equilibria, an agent can search from smaller support sets to larger sets.
Suppose agent $i$ is randomizing between actions ${a}_{i}^{1},\mathrm{\dots},{a}_{i}^{{k}_{i}}$ in a Nash equilibrium. Let ${p}_{i}^{j}$ be the probability that agent $i$ does action ${a}_{i}^{j}$. Let ${\sigma}_{-i}$ be the strategies for the other agents, which is a function of their probabilities. The fact that this is a Nash equilibrium gives the following constraints: ${p}_{i}^{j}>0$, ${\sum}_{j=1}^{{k}_{i}}{p}_{i}^{j}=1$, and, for all $j,{j}^{\prime}$
$$utility({a}_{i}^{j}{\sigma}_{-i},i)=utility({a}_{i}^{{j}^{\prime}}{\sigma}_{-i},i).$$ |
We also require that the utility of doing ${a}_{i}^{j}$ is not less than the utility of doing an action outside of the support set. Thus, for all ${a}^{\prime}\notin \{{a}_{i}^{1},\mathrm{\dots},{a}_{i}^{{k}_{i}}\}$,
$$utility({a}_{i}^{j}{\sigma}_{-i},i)\ge utility({a}^{\prime}{\sigma}_{-i},i).$$ |
In Example 11.9, suppose the goalkeeper jumps right with probability ${{p}}_{{g}}$ and the kicker kicks right with probability ${{p}}_{{k}}$.
If the goalkeeper jumps right, the probability of a goal is
$${0.9}{}{{p}}_{{k}}{+}{0.2}{}{(}{1}{-}{{p}}_{{k}}{)}{.}$$ |
If the goalkeeper jumps left, the probability of a goal is
$${0.3}{}{{p}}_{{k}}{+}{0.6}{}{(}{1}{-}{{p}}_{{k}}{)}{.}$$ |
The only time the goalkeeper would randomize is if these are equal; that is, if
$${0.9}{}{{p}}_{{k}}{+}{0.2}{}{(}{1}{-}{{p}}_{{k}}{)}{=}{0.3}{}{{p}}_{{k}}{+}{0.6}{}{(}{1}{-}{{p}}_{{k}}{)}{.}$$ |
Solving for ${{p}}_{{k}}$ gives ${{p}}_{{k}}{\mathrm{=}}{\mathrm{0.4}}$.
Similarly, for the kicker to randomize, the probability of a goal must be the same whether the kicker kicks left or right:
$${0.2}{}{{p}}_{{g}}{+}{0.6}{}{(}{1}{-}{{p}}_{{g}}{)}{=}{0.9}{}{{p}}_{{g}}{+}{0.3}{}{(}{1}{-}{{p}}_{{g}}{)}{.}$$ |
Solving for ${{p}}_{{g}}$ gives ${{p}}_{{g}}{\mathrm{=}}{\mathrm{0.3}}$.
Thus, the only Nash equilibrium has ${{p}}_{{k}}{\mathrm{=}}{\mathrm{0.4}}$ and ${{p}}_{{g}}{\mathrm{=}}{\mathrm{0.3}}$.