## 9.4 The Value of Information and Control

**Example 9.20:**In Example 9.18, the action

*CheckSmoke*provides information about fire. Checking for smoke costs

*20*units and does not provide any direct reward; however, in an optimal policy, it is worthwhile to check for smoke when there is a report because the agent can condition its further actions on the information obtained. Thus, the information about smoke is valuable to the agent. Even though smoke provides imperfect information about whether there is fire, that information is still very useful for making decisions.

One of the important lessons from this example is that an
information-seeking action, such as *check_for_smoke*, can be treated
in the same way as any other action, such as *call_fire_department*. An
optimal policy often includes actions whose only purpose is to
find information as long
as subsequent
actions can condition on some effect of the action. Most actions do not just provide information;
they also have a more direct effect on the world.

Information is valuable to agents because it helps them make better decisions.

The **value of information** *i* for
decision *D* is the expected value of an optimal policy that can
condition decision *D*, and subsequent decisions, on knowledge of *i*
minus the expected value of an optimal policy that cannot observe
*i*. Thus, in a decision network, it is the value of an optimal policy with *i* as a parent
of *D* and subsequent decisions minus the value of an optimal
policy without *i* as a parent of *D*.

**Example 9.21:**In Example 9.11, consider how much it could be worth to get a better forecast. The value of getting perfect information about the weather for the decision about whether to take an umbrella is the difference between the value of the network with an arc from

*Weather*to

*Umbrella*which, as calculated in Example 9.19, is 91 and the original network, which, as computed in Example 9.11, is 77. Thus, perfect information would be worth

*91-77=14*. This is an upper bound on how much another sensor of the weather could be worth.

The value of information is a bound on the amount the agent would be
willing to pay (in terms of loss of utility)
for information *i* at stage *d*. It is an upper bound on the
amount that imperfect information about the value of *i* at decision
*d* would be worth. Imperfect information is, for example,
information available from a noisy sensor of *i*. It is not worth
paying more for a sensor of *i* than the value of information *i*.

The value of information has some interesting properties:

- The value of information is never negative. The worst that can happen is that the agent can ignore the information.
- If an optimal decision is to do
the same thing no matter which value of
*i*is observed, the value of information*i*is zero. If the value of information*i*is zero, there is an optimal policy that does not depend on the value of*i*(i.e., the same action is chosen no matter which value of*i*is observed).

Within a decision network, the value of information *i* at decision
*d* can be evaluated by considering both

- the decision network with
arcs from
*i*to*d*and from*i*to subsequent decisions and - the decision network without such arcs.

The differences in the values of
the optimal policies of these two decision networks is the value of
information *i* at *d*. Something more sophisticated must be
done when adding the arc from *i* to *d* causes a cycle.

**Example 9.22:**In the alarm problem [Example 9.18], the agent may be interested in knowing whether it is worthwhile to install a relay for the alarm so that the alarm can be heard directly instead of relying on the noisy sensor of people leaving. To determine how much a relay could be worth, consider how much perfect information about the alarm would be worth. If the information is worth less than the cost of the relay, it is not worthwhile to install the relay.

The value of information about the alarm for checking for smoke and for calling can be obtained by solving
the decision network of Figure 9.9 together with the
same network, but with an
arc from *Alarm* to *Check_for_smoke* and an arc from *Alarm* to
*Call_fire_department*. The original network has a value of *-22.6*.
This new decision network has an
optimal policy whose value is *-6.3*. The difference in the values of the
optimal policies for the two decision networks, namely *16.3*, is the value of
*Alarm* for the decision *Check_for_smoke*. If the relay costs 20
units, the installation will not be worthwhile.

The value of the network with an arc from *Alarm* to
*Call_fire_department* is *-6.3*, the same as if there was also an
arc from *Alarm* to *Check_for_smoke*. In the optimal policy, the
information about *Alarm* is ignored in the optimal decision function
for *Check_for_smoke*; the agent never
checks for smoke in the optimal policy when *Alarm* is a parent of *Call_fire_department*.

The **value of control** specifies how much it is worth to
control a variable. In its simplest form, it is the change in value of a
decision network where a random variable is replaced by a decision
variable, and arcs are added to make it a no-forgetting network. If
this is done, the change in utility is non-negative; the resulting
network always has an equal or higher expected
utility.

**Example 9.23:**In the alarm decision network of Figure 9.9, you may be interested in the value of controlling tampering. This could, for example, be used to estimate how much it is worth to add security guards to prevent tampering. To compute this, compare the value of the decision network of Figure 9.9 to the decision network where

*Tampering*is a decision node and a parent of the other two decision nodes.

The value of the initial decision network is *-22.6*. First,
consider the value of information. If *Tampering*
is made a parent of *Call*, the value is *-21.30*. If *Tampering*
is made a parent of *Call* and *CheckSmoke*, the value is *-20.87*.

To determine the value of control, turn the
*Tampering* node into a decision node and make it a parent of the other
two decisions. The
value of the resulting network is *-20.71*. Notice here that control is more valuable than
information.

The value of controlling tampering in the original network is *-20.71-
(-22.6)=1.89*. The value of controlling tampering in the context of
observing tampering is *20.71-( -20.87)=0.16*.

The previous description applies when the parents of the random variable that is being controlled become parents of the decision variable. In this scenario, the value of control is never negative. However, if the parents of the decision node do not include all of the parents of the random variable, it is possible that control is less valuable than information. In general one must be explicit about what information will be available when considering controlling a variable.

**Example 9.24:**Consider controlling the variable

*Smoke*in Figure 9.9. If

*Fire*is a parent of the decision variable

*Smoke*, it has to be a parent of

*Call*to make it a no-forgetting network. The expected utility of the resulting network with

*Smoke*coming before

*checkSmoke*is

*-2.0*. The value of controlling

*Smoke*in this situation is due to observing

*Fire*. The resulting optimal decision is to call if there is a fire and not call otherwise.

Suppose the agent were to control *Smoke* without conditioning on
*Fire*. That is, the agent has to either make smoke or not, and *Fire* is not a parent of the
other decisions. This situation can be modeled by making
*Smoke* a decision variable with no parents. In this case, the expected
utility is *-23.20*, which is worse than the initial decision network,
because blindly controlling *Smoke* loses its ability to act as a sensor
from *Fire*.