## 9.4 The Value of Information and Control

Example 9.20: In Example 9.18, the action CheckSmoke provides information about fire. Checking for smoke costs 20 units and does not provide any direct reward; however, in an optimal policy, it is worthwhile to check for smoke when there is a report because the agent can condition its further actions on the information obtained. Thus, the information about smoke is valuable to the agent. Even though smoke provides imperfect information about whether there is fire, that information is still very useful for making decisions.

One of the important lessons from this example is that an information-seeking action, such as check_for_smoke, can be treated in the same way as any other action, such as call_fire_department. An optimal policy often includes actions whose only purpose is to find information as long as subsequent actions can condition on some effect of the action. Most actions do not just provide information; they also have a more direct effect on the world.

Information is valuable to agents because it helps them make better decisions.

The value of information i for decision D is the expected value of an optimal policy that can condition decision D, and subsequent decisions, on knowledge of i minus the expected value of an optimal policy that cannot observe i. Thus, in a decision network, it is the value of an optimal policy with i as a parent of D and subsequent decisions minus the value of an optimal policy without i as a parent of D.

Example 9.21: In Example 9.11, consider how much it could be worth to get a better forecast. The value of getting perfect information about the weather for the decision about whether to take an umbrella is the difference between the value of the network with an arc from Weather to Umbrella which, as calculated in Example 9.19, is 91 and the original network, which, as computed in Example 9.11, is 77. Thus, perfect information would be worth 91-77=14. This is an upper bound on how much another sensor of the weather could be worth.

The value of information is a bound on the amount the agent would be willing to pay (in terms of loss of utility) for information i at stage d. It is an upper bound on the amount that imperfect information about the value of i at decision d would be worth. Imperfect information is, for example, information available from a noisy sensor of i. It is not worth paying more for a sensor of i than the value of information i.

The value of information has some interesting properties:

• The value of information is never negative. The worst that can happen is that the agent can ignore the information.
• If an optimal decision is to do the same thing no matter which value of i is observed, the value of information i is zero. If the value of information i is zero, there is an optimal policy that does not depend on the value of i (i.e., the same action is chosen no matter which value of i is observed).

Within a decision network, the value of information i at decision d can be evaluated by considering both

• the decision network with arcs from i to d and from i to subsequent decisions and
• the decision network without such arcs.

The differences in the values of the optimal policies of these two decision networks is the value of information i at d. Something more sophisticated must be done when adding the arc from i to d causes a cycle.

Example 9.22: In the alarm problem [Example 9.18], the agent may be interested in knowing whether it is worthwhile to install a relay for the alarm so that the alarm can be heard directly instead of relying on the noisy sensor of people leaving. To determine how much a relay could be worth, consider how much perfect information about the alarm would be worth. If the information is worth less than the cost of the relay, it is not worthwhile to install the relay.

The value of information about the alarm for checking for smoke and for calling can be obtained by solving the decision network of Figure 9.9 together with the same network, but with an arc from Alarm to Check_for_smoke and an arc from Alarm to Call_fire_department. The original network has a value of -22.6. This new decision network has an optimal policy whose value is -6.3. The difference in the values of the optimal policies for the two decision networks, namely 16.3, is the value of Alarm for the decision Check_for_smoke. If the relay costs 20 units, the installation will not be worthwhile.

The value of the network with an arc from Alarm to Call_fire_department is -6.3, the same as if there was also an arc from Alarm to Check_for_smoke. In the optimal policy, the information about Alarm is ignored in the optimal decision function for Check_for_smoke; the agent never checks for smoke in the optimal policy when Alarm is a parent of Call_fire_department.

The value of control specifies how much it is worth to control a variable. In its simplest form, it is the change in value of a decision network where a random variable is replaced by a decision variable, and arcs are added to make it a no-forgetting network. If this is done, the change in utility is non-negative; the resulting network always has an equal or higher expected utility.

Example 9.23: In the alarm decision network of Figure 9.9, you may be interested in the value of controlling tampering. This could, for example, be used to estimate how much it is worth to add security guards to prevent tampering. To compute this, compare the value of the decision network of Figure 9.9 to the decision network where Tampering is a decision node and a parent of the other two decision nodes.

The value of the initial decision network is -22.6. First, consider the value of information. If Tampering is made a parent of Call, the value is -21.30. If Tampering is made a parent of Call and CheckSmoke, the value is -20.87.

To determine the value of control, turn the Tampering node into a decision node and make it a parent of the other two decisions. The value of the resulting network is -20.71. Notice here that control is more valuable than information.

The value of controlling tampering in the original network is -20.71- (-22.6)=1.89. The value of controlling tampering in the context of observing tampering is 20.71-( -20.87)=0.16.

The previous description applies when the parents of the random variable that is being controlled become parents of the decision variable. In this scenario, the value of control is never negative. However, if the parents of the decision node do not include all of the parents of the random variable, it is possible that control is less valuable than information. In general one must be explicit about what information will be available when considering controlling a variable.

Example 9.24: Consider controlling the variable Smoke in Figure 9.9. If Fire is a parent of the decision variable Smoke, it has to be a parent of Call to make it a no-forgetting network. The expected utility of the resulting network with Smoke coming before checkSmoke is -2.0. The value of controlling Smoke in this situation is due to observing Fire. The resulting optimal decision is to call if there is a fire and not call otherwise.

Suppose the agent were to control Smoke without conditioning on Fire. That is, the agent has to either make smoke or not, and Fire is not a parent of the other decisions. This situation can be modeled by making Smoke a decision variable with no parents. In this case, the expected utility is -23.20, which is worse than the initial decision network, because blindly controlling Smoke loses its ability to act as a sensor from Fire.