9.3.3 Variable Elimination for Decision Networks
Fortunately, we do not have to enumerate all of the policies; we can use variable elimination (VE) to find an optimal policy. The idea is to first consider the last decision, find an optimal decision for each value of its parents, and produce a factor of these maximum values. It then has a new decision network, with one less decision, that can be solved recursively.
2: Inputs
3: DN a decision network
4: Output
5: An optimal policy and its expected utility
6: Local
7: DFs: a set of decision functions, initially empty
8: Fs: a set of factors
9: Remove all variables that are not ancestors of the utility node
10: Create a factor in Fs for each conditional probability
11: Create a factor in Fs for the utility
12: while (there are decision nodes remaining)
13: Sum out each random variable that is not a parent of a decision node
14: // at this stage there is one decision node D that is in a factor F with a subset of its parents
15: Add max_{D} F to Fs.
16: Add argmax_{D} F to DFs.
17: Sum out all remaining random variables
18: Return DFs and the product of remaining factors
Figure 9.11 shows how to use VE for decision networks. Essentially it computes the expected utility of an optimal decision. It eliminates the random variables that are not parents of a decision node by summing them out according to some elimination ordering. The elimination ordering does not affect correctness and so it can be chosen for efficiency.
After eliminating all of the random variables that are not parents of a decision node, there must be one decision variable that is in a factor that only contains it and some subset of its parents because of the no-forgetting condition. This is the last action in the ordering of actions.
To eliminate that decision node, VE_DN chooses the values for the decision that result in the maximum utility. This maximization creates a new factor on the remaining variables and a decision function for the decision variable being eliminated. This decision function created by maximizing is a component of an optimal policy.
Forecast | Umbrella | Value |
sunny | takeIt | 12.95 |
sunny | leaveIt | 49.0 |
cloudy | takeIt | 8.05 |
cloudy | leaveIt | 14.0 |
rainy | takeIt | 14.0 |
rainy | leaveIt | 7.0 |
To maximize over Umbrella, for each value of Forecast, VE_DN selects the value of Umbrella that maximizes the value of the factor. For example, when the forecast is sunny, the agent should leave the umbrella at home for a value of 49.0.
VE_DN constructs an optimal decision function for Umbrella by selecting a value of Umbrella that results in the maximum value for each value of Forecast:
Forecast | Umbrella |
sunny | leaveIt |
cloudy | leaveIt |
rainy | takeIt |
It also creates a new factor that contains the maximal value for each value of Forecast:
Forecast | Value |
sunny | 49.0 |
cloudy | 14.0 |
rainy | 14.0 |
It now sums out Forecast from this factor, which gives the value 77.0. This is the expected value of the optimal policy.
Meaning Factor P(Tampering) f_{0}(Tampering) P(Fire) f_{1}(Fire) P(Alarm | Tampering, Fire) f_{2}(Tampering, Fire, Alarm) P(Smoke | Fire) f_{3}(Fire, Smoke) P(Leaving | Alarm) f_{4}(Alarm,Leaving) P(Report | Leaving) f_{5}(Leaving, Report) P(SeeSmoke | CheckSmoke, Smoke) f_{6}(Smoke, SeeSmoke, CheckSmoke) utility(Fire, CheckSmoke, Call) f_{7}(Fire, CheckSmoke, Call)
The expected utility is the product of the probability and the utility, as long as the appropriate actions are chosen.
VE_DN sums out the random variables that are not parents of a decision node. Thus, it sums out Tampering, Fire, Alarm, Smoke, and Leaving. After these have been eliminated, there is a single factor, part of which (to two decimal places) is:
Report | SeeSmoke | CheckSmoke | Call | Value |
t | t | t | t | -1.33 |
t | t | t | f | -29.30 |
t | t | f | t | 0 |
t | t | f | f | 0 |
t | f | t | t | -4.86 |
t | f | t | f | -3.68 |
... | ... | ... | ... | ... |
From this factor, an optimal decision function can be created for Call by selecting a value for Call that maximizes Value for each assignment to Report, SeeSmoke, and CheckSmoke. The maximum of -1.33 and -29.3 is -1.33, so when Report=t, SeeSmoke=t, and CheckSmoke=t, the optimal action is Call=t with value -1.33. The method is the same for the other values of Report, SeeSmoke and CheckSmoke.
An optimal decision function for Call is
Report | SeeSmoke | CheckSmoke | Call |
t | t | t | t |
t | t | f | t |
t | f | t | f |
... | ... | ... | ... |
Note that the value for Call when SeeSmoke=t and CheckSmoke=f is arbitrary. It does not matter what the agent plans to do in this situation, because the situation never arises.
The factor resulting from maximizing Call contains the maximum values for each combination of Report, SeeSmoke, and CheckSmoke:
Report | SeeSmoke | CheckSmoke | Value |
t | t | t | -1.33 |
t | t | f | 0 |
t | f | t | -3.68 |
... | ... | ... | ... |
It can then sum out SeeSmoke, which gives the factor
Report | CheckSmoke | Value |
t | t | -5.01 |
t | f | -5.65 |
f | t | -23.77 |
f | f | -17.58 |
Maximizing CheckSmoke for each value of Report gives the decision function
Report | CheckSmoke |
t | t |
f | f |
and the factor
Report | Value |
t | -5.01 |
f | -17.58 |
Summing out Report gives the expected utility of -22.60 (taking into account rounding errors).
Thus, the policy returned can be seen as
call_fire_department ←see_smoke.
call_fire_department ←report ∧¬check_smoke ∧¬see_smoke.
The last of these rules is never used because the agent following the optimal policy does check for smoke if there is a report. However, when executing VE_DN, the agent does not know an optimal policy for CheckSmoke when it is optimizing Call. Only by considering what to do with the information on smoke does the agent determine whether to check for smoke.
Note also that, in this case, even though checking for smoke has a cost associated with it, checking for smoke is worthwhile because the information obtained is valuable.
The following example shows how the factor containing a decision variable can contain a subset of its parents when the VE algorithm optimizes the decision.
Weather | Umbrella |
norain | leaveIt |
rain | takeIt |
Weather | Value |
norain | 100 |
rain | 70 |
Note that the forecast is irrelevant to the decision. Knowing the forecast does not give the agent any useful information.
Summing out Forecast gives a factor that contains ones. Summing out Weather, where P(Weather=norain)=0.7, gives the expected utility 0.7×100+0.3×70=91.