Third edition of Artificial Intelligence: foundations of computational agents, Cambridge University Press, 2023 is now available (including the full text).
9.4 The Value of Information and Control
One of the important lessons from this example is that an information-seeking action, such as check_for_smoke, can be treated in the same way as any other action, such as call_fire_department. An optimal policy often includes actions whose only purpose is to find information as long as subsequent actions can condition on some effect of the action. Most actions do not just provide information; they also have a more direct effect on the world.
Information is valuable to agents because it helps them make better decisions.
The value of information i for decision D is the expected value of an optimal policy that can condition decision D, and subsequent decisions, on knowledge of i minus the expected value of an optimal policy that cannot observe i. Thus, in a decision network, it is the value of an optimal policy with i as a parent of D and subsequent decisions minus the value of an optimal policy without i as a parent of D.
The value of information is a bound on the amount the agent would be willing to pay (in terms of loss of utility) for information i at stage d. It is an upper bound on the amount that imperfect information about the value of i at decision d would be worth. Imperfect information is, for example, information available from a noisy sensor of i. It is not worth paying more for a sensor of i than the value of information i.
The value of information has some interesting properties:
- The value of information is never negative. The worst that can happen is that the agent can ignore the information.
- If an optimal decision is to do the same thing no matter which value of i is observed, the value of information i is zero. If the value of information i is zero, there is an optimal policy that does not depend on the value of i (i.e., the same action is chosen no matter which value of i is observed).
Within a decision network, the value of information i at decision d can be evaluated by considering both
- the decision network with arcs from i to d and from i to subsequent decisions and
- the decision network without such arcs.
The differences in the values of the optimal policies of these two decision networks is the value of information i at d. Something more sophisticated must be done when adding the arc from i to d causes a cycle.
The value of information about the alarm for checking for smoke and for calling can be obtained by solving the decision network of Figure 9.9 together with the same network, but with an arc from Alarm to Check_for_smoke and an arc from Alarm to Call_fire_department. The original network has a value of -22.6. This new decision network has an optimal policy whose value is -6.3. The difference in the values of the optimal policies for the two decision networks, namely 16.3, is the value of Alarm for the decision Check_for_smoke. If the relay costs 20 units, the installation will not be worthwhile.
The value of the network with an arc from Alarm to Call_fire_department is -6.3, the same as if there was also an arc from Alarm to Check_for_smoke. In the optimal policy, the information about Alarm is ignored in the optimal decision function for Check_for_smoke; the agent never checks for smoke in the optimal policy when Alarm is a parent of Call_fire_department.
The value of control specifies how much it is worth to control a variable. In its simplest form, it is the change in value of a decision network where a random variable is replaced by a decision variable, and arcs are added to make it a no-forgetting network. If this is done, the change in utility is non-negative; the resulting network always has an equal or higher expected utility.
The value of the initial decision network is -22.6. First, consider the value of information. If Tampering is made a parent of Call, the value is -21.30. If Tampering is made a parent of Call and CheckSmoke, the value is -20.87.
To determine the value of control, turn the Tampering node into a decision node and make it a parent of the other two decisions. The value of the resulting network is -20.71. Notice here that control is more valuable than information.
The value of controlling tampering in the original network is -20.71- (-22.6)=1.89. The value of controlling tampering in the context of observing tampering is 20.71-( -20.87)=0.16.
The previous description applies when the parents of the random variable that is being controlled become parents of the decision variable. In this scenario, the value of control is never negative. However, if the parents of the decision node do not include all of the parents of the random variable, it is possible that control is less valuable than information. In general one must be explicit about what information will be available when considering controlling a variable.
Suppose the agent were to control Smoke without conditioning on Fire. That is, the agent has to either make smoke or not, and Fire is not a parent of the other decisions. This situation can be modeled by making Smoke a decision variable with no parents. In this case, the expected utility is -23.20, which is worse than the initial decision network, because blindly controlling Smoke loses its ability to act as a sensor from Fire.