9.3.3 Variable Elimination for Decision Networks

Fortunately, we do not have to enumerate all of the policies; we can use variable elimination (VE) to find an optimal policy. The idea is to first consider the last decision, find an optimal decision for each value of its parents, and produce a factor of these maximum values. It then has a new decision network, with one less decision, that can be solved recursively.

1: Procedure VE_DN(DN):
2:           Inputs
3:                     DN a decision network
4:           Output
5:                     An optimal policy and its expected utility
6:           Local
7:                     DFs: a set of decision functions, initially empty
8:                     Fs: a set of factors
9:           Remove all variables that are not ancestors of the utility node
10:           Create a factor in Fs for each conditional probability
11:           Create a factor in Fs for the utility
12:           while (there are decision nodes remaining)
13:                     Sum out each random variable that is not a parent of a decision node
14:                     // at this stage there is one decision node D that is in a factor F with a subset of its parents
15:                     Add maxD F to Fs.
16:                     Add argmaxD F to DFs.
17:           Sum out all remaining random variables
18:           Return DFs and the product of remaining factors
Figure 9.11: Variable elimination for decision networks

Figure 9.11 shows how to use VE for decision networks. Essentially it computes the expected utility of an optimal decision. It eliminates the random variables that are not parents of a decision node by summing them out according to some elimination ordering. The elimination ordering does not affect correctness and so it can be chosen for efficiency.

After eliminating all of the random variables that are not parents of a decision node, there must be one decision variable that is in a factor that only contains it and some subset of its parents because of the no-forgetting condition. This is the last action in the ordering of actions.

To eliminate that decision node, VE_DN chooses the values for the decision that result in the maximum utility. This maximization creates a new factor on the remaining variables and a decision function for the decision variable being eliminated. This decision function created by maximizing is a component of an optimal policy.

Example 9.17: In Example 9.11, there are three initial factors representing P(Weather), P(Forecast|Weather), and Utility(Weather,Umbrella). First, it eliminates Weather: by multiplying all three factors and summing out Weather, giving a factor on Forecast and Umbrella,
Forecast Umbrella Value
sunny takeIt 12.95
sunny leaveIt 49.0
cloudy takeIt 8.05
cloudy leaveIt 14.0
rainy takeIt 14.0
rainy leaveIt 7.0

To maximize over Umbrella, for each value of Forecast, VE_DN selects the value of Umbrella that maximizes the value of the factor. For example, when the forecast is sunny, the agent should leave the umbrella at home for a value of 49.0.

VE_DN constructs an optimal decision function for Umbrella by selecting a value of Umbrella that results in the maximum value for each value of Forecast:

Forecast Umbrella
sunny leaveIt
cloudy leaveIt
rainy takeIt

It also creates a new factor that contains the maximal value for each value of Forecast:

Forecast Value
sunny 49.0
cloudy 14.0
rainy 14.0

It now sums out Forecast from this factor, which gives the value 77.0. This is the expected value of the optimal policy.

Example 9.18: Consider Example 9.13. Before summing out any variables it has the following factors:
Meaning Factor
P(Tampering) f0(Tampering)
P(Fire) f1(Fire)
P(Alarm | Tampering, Fire) f2(Tampering, Fire, Alarm)
P(Smoke | Fire) f3(Fire, Smoke)
P(Leaving | Alarm) f4(Alarm,Leaving)
P(Report | Leaving) f5(Leaving, Report)
P(SeeSmoke | CheckSmoke, Smoke) f6(Smoke, SeeSmoke, CheckSmoke)
utility(Fire, CheckSmoke, Call) f7(Fire, CheckSmoke, Call)

The expected utility is the product of the probability and the utility, as long as the appropriate actions are chosen.

VE_DN sums out the random variables that are not parents of a decision node. Thus, it sums out Tampering, Fire, Alarm, Smoke, and Leaving. After these have been eliminated, there is a single factor, part of which (to two decimal places) is:

Report SeeSmoke CheckSmoke Call Value
t t t t -1.33
t t t f -29.30
t t f t 0
t t f f 0
t f t t -4.86
t f t f -3.68
... ... ... ... ...

From this factor, an optimal decision function can be created for Call by selecting a value for Call that maximizes Value for each assignment to Report, SeeSmoke, and CheckSmoke. The maximum of -1.33 and -29.3 is -1.33, so when Report=t, SeeSmoke=t, and CheckSmoke=t, the optimal action is Call=t with value -1.33. The method is the same for the other values of Report, SeeSmoke and CheckSmoke.

An optimal decision function for Call is

Report SeeSmoke CheckSmoke Call
t t t t
t t f t
t f t f
... ... ... ...

Note that the value for Call when SeeSmoke=t and CheckSmoke=f is arbitrary. It does not matter what the agent plans to do in this situation, because the situation never arises.

The factor resulting from maximizing Call contains the maximum values for each combination of Report, SeeSmoke, and CheckSmoke:

Report SeeSmoke CheckSmoke Value
t t t -1.33
t t f 0
t f t -3.68
... ... ... ...

It can then sum out SeeSmoke, which gives the factor

Report CheckSmoke Value
t t -5.01
t f -5.65
f t -23.77
f f -17.58

Maximizing CheckSmoke for each value of Report gives the decision function

Report CheckSmoke
t t
f f

and the factor

Report Value
t -5.01
f -17.58

Summing out Report gives the expected utility of -22.60 (taking into account rounding errors).

Thus, the policy returned can be seen as

checkSmoke ←report.
call_fire_department ←see_smoke.
call_fire_department ←report ∧¬check_smoke ∧¬see_smoke.

The last of these rules is never used because the agent following the optimal policy does check for smoke if there is a report. However, when executing VE_DN, the agent does not know an optimal policy for CheckSmoke when it is optimizing Call. Only by considering what to do with the information on smoke does the agent determine whether to check for smoke.

Note also that, in this case, even though checking for smoke has a cost associated with it, checking for smoke is worthwhile because the information obtained is valuable.

The following example shows how the factor containing a decision variable can contain a subset of its parents when the VE algorithm optimizes the decision.

Example 9.19: Consider Example 9.11, but with an extra arc from Weather to Umbrella. That is, the agent gets to observe both the weather and the forecast. In this case, there are no random variables to sum out, and the factor that contains the decision node and a subset of its parents is the original utility factor. It can then maximize Umbrella, giving the decision function and the factor:
Weather Umbrella
norain leaveIt
rain takeIt
Weather Value
norain 100
rain 70

Note that the forecast is irrelevant to the decision. Knowing the forecast does not give the agent any useful information.

Summing out Forecast gives a factor that contains ones. Summing out Weather, where P(Weather=norain)=0.7, gives the expected utility 0.7×100+0.3×70=91.