foundations of computational agents
The third edition of Artificial Intelligence: foundations of computational agents, Cambridge University Press, 2023 is now available (including full text).
Fortunately, an agent does not have to enumerate all of the policies; variable elimination (VE) can be adapted to find an optimal policy. The idea is first to consider the last decision, find an optimal decision for each value of its parents, and produce a factor of these maximum values. This results in a new decision network, with one less decision, that can be solved recursively.
Figure 9.13 shows how to use variable elimination for decision networks. Essentially, it computes the expected utility of an optimal decision. It eliminates the random variables that are not parents of a decision node by summing them out according to some elimination ordering. The ordering of the random variables being eliminated does not affect correctness and so it can be chosen for efficiency.
After eliminating all of the random variables that are not parents of a decision node, in a no-forgetting decision network, there must be one decision variable that is in a factor where all of the variables, other than , in are parents of . This decision the last decision in the ordering of decisions.
To eliminate that decision node, chooses the values for the decision that result in the maximum utility. This maximization creates a new factor on the remaining variables and a decision function for the decision variable being eliminated. This decision function created by maximizing is one of decision functions in an optimal policy.
In Example 9.13, there are three initial factors representing , , and . First, it eliminates by multiplying all three factors and summing out , giving a factor on and ,
Value | |||
---|---|---|---|
12.95 | |||
49.0 | |||
8.05 | |||
14.0 | |||
14.0 | |||
7.0 |
To maximize over , for each value of , selects the value of that maximizes the value of the factor. For example, when the forecast is , the agent should leave the umbrella at home for a value of 49.0.
constructs an optimal decision function for by selecting a value of that results in the maximum value for each value of :
It also creates a new factor that contains the maximal value for each value of :
49.0 | |
14.0 | |
14.0 |
It now sums out from this factor, which gives the value 77.0. This is the expected value of the optimal policy.
Consider Example 9.15. Before summing out any variables it has the following factors:
The expected utility is the product of the probability and the utility, as long as the appropriate actions are chosen.
sums out the random variables that are not parents of a decision node. Thus, it sums out , , , , and . After these have been eliminated, there is a single factor, part of which (to two decimal places) is:
Value | |||||
---|---|---|---|---|---|
0 | |||||
0 | |||||
… | … | … | … | … |
From this factor, an optimal decision function can be created for by selecting a value for that maximizes for each assignment to , , and .
Consider the case when , , and . The maximum of and is , so for this case, the optimal action is with value . This maximization is repeated for the other values of , and .
An optimal decision function for is
… | … | … | … |
The value for when , and is arbitrary. It does not matter what the agent plans to do in this situation, because the situation never arises. The algorithm does not need to treat this as a special case.
The factor resulting from maximizing contains the maximum values for each combination of , , and :
Value | ||||
---|---|---|---|---|
0 | ||||
… | … | … | … |
Summing out gives the factor
Value | |||
---|---|---|---|
Maximizing for each value of gives the decision function
and the factor
Value | ||
---|---|---|
Summing out gives the expected utility of (taking into account rounding errors).
Thus, the policy returned can be seen as the rules
The last of these rules is never used because the agent following the optimal policy does check for smoke if there is a report. It remains in the policy because has not determined an optimal policy for when it is optimizing .
Note also that, in this case, even though checking for smoke has an immediate negative reward, checking for smoke is worthwhile because the information obtained is valuable.
The following example shows how the factor containing a decision variable can contain a subset of its parents when the VE algorithm optimizes the decision.
Consider Example 9.13, but with an extra arc from to . That is, the agent gets to observe both the weather and the forecast. In this case, there are no random variables to sum out, and the factor that contains the decision node and a subset of its parents is the original utility factor. It can then maximize , giving the decision function and the factor:
100 | |
70 |
Note that the forecast is irrelevant to the decision. Knowing the forecast does not give the agent any useful information. Summing out gives a factor where all of the values are 1.
Summing out , where , gives the expected utility .