9.3 Sequential Decisions 9.3.2 Policies 9.4 The Value of Information and Control

The third edition of Artificial Intelligence: foundations of computational agents, Cambridge University Press, 2023 is now available (including full text).

9.3.3 Variable Elimination for Decision Networks

Fortunately, an agent does not have to enumerate all of the policies; variable elimination (VE) can be adapted to find an optimal policy. The idea is first to consider the last decision, find an optimal decision for each value of its parents, and produce a factor of these maximum values. This results in a new decision network, with one less decision, that can be solved recursively.

1: procedure VE_DN(

D N

2: Inputs

D N

a decision network

4: Output

5: An optimal policy and its expected utility

6: Local

D F s

: a set of decision functions, initially empty

F s

: a set of factors

9: Remove all variables that are not ancestors of the utility node

10: Create a factor in

F s

for each conditional probability

11: Create a factor in

F s

for the utility

12: while there are decision nodes remaining do

13: Sum out each random variable that is not a parent of a decision node

14: Let

D

be the last decision remaining

15:

\triangleright

D

is only in a factor

F(D,V_{1},\dots V_{k})

where

V_{1}\dots V_{k}

are parents of

D

16: Add

\max_{D}F

F s

17: Add

\arg\max_{D}F

D F s

18: Sum out all remaining random variables

19: Return

D F s

and the product of remaining factors

Figure 9.13: Variable elimination for decision networks

Figure 9.13 shows how to use variable elimination for decision networks. Essentially, it computes the expected utility of an optimal decision. It eliminates the random variables that are not parents of a decision node by summing them out according to some elimination ordering. The ordering of the random variables being eliminated does not affect correctness and so it can be chosen for efficiency.

After eliminating all of the random variables that are not parents of a decision node, in a no-forgetting decision network, there must be one decision variable $D$ that is in a factor $F$ where all of the variables, other than $D$ , in $F$ are parents of $D$ . This decision $D$ the last decision in the ordering of decisions.

To eliminate that decision node, $VE\_DN$ chooses the values for the decision that result in the maximum utility. This maximization creates a new factor on the remaining variables and a decision function for the decision variable being eliminated. This decision function created by maximizing is one of decision functions in an optimal policy.

Example 9.19.

In Example 9.13, there are three initial factors representing $P(Weather)$ , $P(Forecast\mid Weather)$ , and $u(Weather,Umbrella)$ . First, it eliminates $W e a t h e r$ by multiplying all three factors and summing out $W e a t h e r$ , giving a factor on $F o r e c a s t$ and $U m b r e l l a$ ,

$F o r e c a s t$	$U m b r e l l a$	Value
$s u n n y$	$take\_it$	12.95
$s u n n y$	$leave\_it$	49.0
$c l o u d y$	$take\_it$	8.05
$c l o u d y$	$leave\_it$	14.0
$r a i n y$	$take\_it$	14.0
$r a i n y$	$leave\_it$	7.0

To maximize over $U m b r e l l a$ , for each value of $F o r e c a s t$ , $VE\_DN$ selects the value of $U m b r e l l a$ that maximizes the value of the factor. For example, when the forecast is $s u n n y$ , the agent should leave the umbrella at home for a value of 49.0.

$VE\_DN$ constructs an optimal decision function for $U m b r e l l a$ by selecting a value of $U m b r e l l a$ that results in the maximum value for each value of $F o r e c a s t$ :

$F o r e c a s t$	$U m b r e l l a$
$s u n n y$	$leave\_it$
$c l o u d y$	$leave\_it$
$r a i n y$	$take\_it$

It also creates a new factor that contains the maximal value for each value of $F o r e c a s t$ :

$F o r e c a s t$	$V a l u e$
$s u n n y$	49.0
$c l o u d y$	14.0
$r a i n y$	14.0

It now sums out $F o r e c a s t$ from this factor, which gives the value 77.0. This is the expected value of the optimal policy.

Example 9.20.

Consider Example 9.15. Before summing out any variables it has the following factors:

\begin{array}[]{l|l}Meaning&Factor\\ \hline P(Tampering)&f_{0}(Tampering)\\ P(Fire)&f_{1}(Fire)\\ P(Alarm\mid Tampering,Fire)&f_{2}(Tampering,Fire,Alarm)\\ P(Smoke\mid Fire)&f_{3}(Fire,Smoke)\\ P(Leaving\mid Alarm)&f_{4}(Alarm,Leaving)\\ P(Report\mid Leaving)&f_{5}(Leaving,Report)\\ P(See\_smoke\mid Check\_smoke,Smoke)&f_{6}(Smoke,See\_smoke,Check\_smoke)\\ u(Fire,Check\_smoke,Call)&f_{7}(Fire,Check\_smoke,Call)\end{array}

The expected utility is the product of the probability and the utility, as long as the appropriate actions are chosen.

$VE\_DN$ sums out the random variables that are not parents of a decision node. Thus, it sums out $T a m p e r i n g$ , $F i r e$ , $A l a r m$ , $S m o k e$ , and $L e a v i n g$ . After these have been eliminated, there is a single factor, part of which (to two decimal places) is:

$R e p o r t$	$See\_smoke$	$Check\_smoke$	$C a l l$	Value
$t r u e$	$t r u e$	$y e s$	$y e s$	$-1.$	$33$
$t r u e$	$t r u e$	$y e s$	$n o$	$-29.$	$30$
$t r u e$	$t r u e$	$n o$	$y e s$	0
$t r u e$	$t r u e$	$n o$	$n o$	0
$t r u e$	$f a l s e$	$y e s$	$y e s$	$-4.$	$86$
$t r u e$	$f a l s e$	$y e s$	$n o$	$-3.$	$68$
…	…	…	…	…

From this factor, an optimal decision function can be created for $C a l l$ by selecting a value for $C a l l$ that maximizes $V a l u e$ for each assignment to $R e p o r t$ , $See\_smoke$ , and $Check\_smoke$ .

Consider the case when $Report{=}true$ , $See\_smoke{=}true$ , and $Check\_smoke{=}yes$ . The maximum of $-1.33$ and $-29.3$ is $-1.33$ , so for this case, the optimal action is $Call{=}yes$ with value $-1.33$ . This maximization is repeated for the other values of $R e p o r t$ , $See\_smoke$ and $Check\_smoke$ .

An optimal decision function for $C a l l$ is

$R e p o r t$	$See\_smoke$	$Check\_smoke$	$C a l l$
$t r u e$	$t r u e$	$y e s$	$y e s$
$t r u e$	$t r u e$	$n o$	$y e s$
$t r u e$	$f a l s e$	$y e s$	$n o$
…	…	…	…

The value for $C a l l$ when $Report{=}true$ , $See\_smoke{=}true$ and $Check\_smoke{=}no$ is arbitrary. It does not matter what the agent plans to do in this situation, because the situation never arises. The algorithm does not need to treat this as a special case.

The factor resulting from maximizing $C a l l$ contains the maximum values for each combination of $R e p o r t$ , $See\_smoke$ , and $Check\_smoke$ :

$R e p o r t$	$See\_smoke$	$Check\_smoke$	Value
$t r u e$	$t r u e$	$y e s$	$-1.33$	$33$
$t r u e$	$t r u e$	$n o$	0
$t r u e$	$f a l s e$	$y e s$	$-3.$	$68$
…	…	…	…

Summing out $See\_smoke$ gives the factor

$R e p o r t$	$Check\_smoke$	Value
$t r u e$	$y e s$	$-5.$	$01$
$t r u e$	$n o$	$-5.$	$65$
$f a l s e$	$y e s$	$-23.$	$77$
$f a l s e$	$n o$	$-17.$	$58$

Maximizing $Check\_smoke$ for each value of $R e p o r t$ gives the decision function

$R e p o r t$	$Check\_smoke$
$t r u e$	$y e s$
$f a l s e$	$n o$

and the factor

$R e p o r t$	Value
$t r u e$	$-5.$	$01$
$f a l s e$	$-17.$	$58$

Summing out $R e p o r t$ gives the expected utility of $-22.60$ (taking into account rounding errors).

Thus, the policy returned can be seen as the rules

	$\displaystyle{check\_smoke\leftarrow\mbox{}report.}$
	$\displaystyle{call\leftarrow\mbox{}see\_smoke.}$
	$\displaystyle{call\leftarrow\mbox{}report\wedge\mbox{}\neg check\_smoke\wedge% \mbox{}\neg see\_smoke.}$

The last of these rules is never used because the agent following the optimal policy does check for smoke if there is a report. It remains in the policy because $VE\_DN$ has not determined an optimal policy for $Check\_smoke$ when it is optimizing $C a l l$ .

Note also that, in this case, even though checking for smoke has an immediate negative reward, checking for smoke is worthwhile because the information obtained is valuable.

The following example shows how the factor containing a decision variable can contain a subset of its parents when the VE algorithm optimizes the decision.

Example 9.21.

Consider Example 9.13, but with an extra arc from $W e a t h e r$ to $U m b r e l l a$ . That is, the agent gets to observe both the weather and the forecast. In this case, there are no random variables to sum out, and the factor that contains the decision node and a subset of its parents is the original utility factor. It can then maximize $U m b r e l l a$ , giving the decision function and the factor:

$W e a t h e r$	$U m b r e l l a$
$n o r a i n$	$leave\_it$
$r a i n$	$take\_it$

$W e a t h e r$	$V a l u e$
$n o r a i n$	100
$r a i n$	70

Note that the forecast is irrelevant to the decision. Knowing the forecast does not give the agent any useful information. Summing out $F o r e c a s t$ gives a factor where all of the values are 1.

Summing out $W e a t h e r$ , where $P(Weather{=}norain)=0.7$ , gives the expected utility $0.7*100+0.3*70=91$ .

Artificial Intelligence 2E

9.3.3 Variable Elimination for Decision Networks

Example 9.19.

Example 9.20.

Example 9.21.

Artificial
Intelligence 2E