Third edition of Artificial Intelligence: foundations of computational agents, Cambridge University Press, 2023 is now available (including the full text).

### 9.5.2 Value of an Optimal Policy

Let *Q ^{*}(s,a)*, where

*s*is a state and

*a*is an action, be the expected value of doing

*a*in state

*s*and then following the optimal policy. Let

*V*, where

^{*}(s)*s*is a state, be the expected value of following an optimal policy from state

*s*.

*Q ^{*}* can be defined analogously to

*Q*:

^{π}

Q ^{*}(s,a)= ∑ _{s'}P(s'|s,a) (R(s,a,s')+ γV^{*}(s')).

*V ^{*}(s)* is obtained by performing the action that gives the best value in
each state:

V ^{*}(s)=max _{a}Q^{*}(s,a).

An optimal policy *π ^{*}* is one of the policies that gives the best
value for each state:

π ^{*}(s)= argmax _{a}Q^{*}(s,a).

Note that *argmax _{a} Q^{*}(s,a)* is a function of state

*s*, and its value is one of the

*a*'s that results in the maximum value of

*Q*.

^{*}(s,a)