8.3 Belief Networks

The third edition of Artificial Intelligence: foundations of computational agents, Cambridge University Press, 2023 is now available (including full text).

8.3.2 Constructing Belief Networks

To represent a domain in a belief network, the designer of a network must consider the following questions:

  • What are the relevant variables? In particular, the designer must consider

    • what the agent may observe in the domain. Each feature that may be observed should be a variable, because the agent must be able to condition on all of its observations.

    • what information the agent is interested in knowing the posterior probability of. Each of these features should be made into a variable that can be queried.

    • other hidden variables or latent variables that will not be observed or queried but make the model simpler. These variables either account for dependencies, reduce the size of the specification of the conditional probabilities, or better model how the world is assumed to work.

  • What values should these variables take? This involves considering the level of detail at which the agent should reason to answer the sorts of queries that will be encountered.

    For each variable, the designer should specify what it means to take each value in its domain. What must be true in the world for a (non-hidden) variable to have a particular value should satisfy the clarity principle: an omniscient agent should be able to know the value of a variable. It is a good idea to explicitly document the meaning of all variables and their possible values. The only time the designer may not want to do this for hidden variables whose values the agent will want to learn from data [see Section 10.3.2].

  • What is the relationship between the variables? This should be expressed by adding arcs in the graph to define the parent relation.

  • How does the distribution of a variable depend on its parents? This is expressed in terms of the conditional probability distributions.

Example 8.15.

Suppose you want to use the diagnostic assistant to diagnose whether there is a fire in a building and whether there has been some tampering with equipment based on noisy sensor information and possibly conflicting explanations of what could be going on. The agent receives a report from Sam about whether everyone is leaving the building. Suppose Sam’s report is noisy: Sam sometimes reports leaving when there is no exodus (a false positive), and sometimes does not report when everyone is leaving (a false negative). Suppose the leaving only depends on the fire alarm going off. Either tampering or fire could affect the alarm. Whether there is smoke only depends on whether there is fire.

Suppose we use the following variables in the following order:

  • Tampering is true when there is tampering with the alarm.

  • Fire is true when there is a fire.

  • Alarm is true when the alarm sounds.

  • Smoke is true when there is smoke.

  • Leaving is true if there are many people leaving the building at once.

  • Report is true if Sam reports people leaving. Report is false if there is no report of leaving.

Assume the following conditional independencies:

  • Fire is conditionally independent of Tampering (given no other information).

  • Alarm depends on both Fire and Tampering. That is, we are making no independence assumptions about how Alarm depends on its predecessors given this variable ordering.

  • Smoke depends only on Fire and is conditionally independent of Tampering and Alarm given whether there is a Fire.

  • Leaving only depends on Alarm and not directly on Fire or Tampering or Smoke. That is, Leaving is conditionally independent of the other variables given Alarm.

  • Report only directly depends on Leaving.

The belief network of Figure 8.3 expresses these dependencies.

Figure 8.3: Belief network for report of leaving of Example 8.15

This network represents the factorization

P(Tam pering,Fire,Alarm,Smoke,Leaving,Report)
= P(Tampering)*P(Fire)*P(AlarmTampering,Fire)
*P(SmokeFire)*P(LeavingAlarm)*P(ReportLeaving).

Note that the alarm is not a smoke alarm, which would affected by the smoke, and not directly by the fire, but rather is a heat alarm that is directly affected by the fire. This is made explicit in the model in that the Alarm is independent of Smoke given Fire.

We also must define the domain of each variable. Assume that the variables are Boolean; that is, they have domain {true,false}. We use the lower-case variant of the variable to represent the true value and use negation for the false value. Thus, for example, Tampering=true is written as tampering, and Tampering=false is written as ¬tampering.

The examples that follow assume the following conditional probabilities:

P(tampering)=0.02 P(fire)=0.01 P(alarmfiretampering)=0.5 P(alarmfire¬tampering)=0.99 P(alarm¬firetampering)=0.85 P(alarm¬fire¬tampering)=0.0001 P(smokefire)=0.9 P(smoke¬fire)=0.01 P(leavingalarm)=0.88 P(leaving¬alarm)=0.001 P(reportleaving)=0.75 P(report¬leaving)=0.01

Before any evidence arrives, the probability is given by the priors. The following probabilities follow from the model (all of the numbers here are to about three decimal places):

P(tampering)=0.02
P(fire)=0.01
P(report)=0.028
P(smoke)=0.0189

Observing a report gives the following:

P(tamperingreport)=0.399
P(firereport)=0.2305
P(smokereport)=0.215

As expected, the probabilities of both tampering and fire are increased by the report. Because the probability of fire is increased, so is the probability of smoke.

Suppose instead that smoke alone was observed:

P(tamperingsmoke)=0.02
P(firesmoke)=0.476
P(reportsmoke)=0.320

Note that the probability of tampering is not affected by observing smoke; however, the probabilities of report and fire are increased.

Suppose that both report and smoke were observed:

P(tamperingreportsmoke)=0.0284
P(firereportsmoke)=0.964

Observing both makes fire even more likely. However, in the context of the report, the presence of smoke makes tampering less likely. This is because the report is explained away by fire, which is now more likely.

Suppose instead that report, but not smoke, was observed:

P(tamperingreport¬smoke)=0.501
P(firereport¬smoke)=0.0294

In the context of the report, fire becomes much less likely and so the probability of tampering increases to explain the report.

This example illustrates how the belief net independence assumption gives commonsense conclusions and also demonstrates how explaining away is a consequence of the independence assumption of a belief network.

Figure 8.4: Belief network for Example 8.16
Example 8.16.

Consider the problem of diagnosing why someone is sneezing and perhaps has a fever. Sneezing could be because of influenza or because of hay fever. They are not independent, but are correlated due to the season. Suppose hay fever depends on the season because it depends on the amount of pollen, which in turn depends on the season. The agent does not get to observe sneezing directly, but rather observed just the “Achoo” sound. Suppose fever depends directly on influenza. These dependency considerations lead to the belief network of Figure 8.4.

  • For each wire wi, there is a random variable, Wi, with domain {live,dead}, which denotes whether there is power in wire wi. Wi=live means wire wi has power. Wi=dead means there is no power in wire wi.

  • Outside_power with domain {live,dead} denotes whether there is power coming into the building.

  • For each switch si, variable Si_pos denotes the position of si. It has domain {up,down}.

  • For each switch si, variable Si_st denotes the state of switch si. It has domain {ok,upside_down,short,intermittent,broken}. Si_st=ok means switch si is working normally. Si_st=upside_down means switch si is installed upside-down. Si_st=short means switch si is shorted and acting as a wire. Si_st=broken means switch si is broken and does not allow electricity to flow.

  • For each circuit breaker cbi, variable Cbi_st has domain {on,off}. Cbi_st=on means power could flow through cbi and Cbi_st=off means that power could not flow through cbi.

  • For each light li, variable Li_st with domain {ok,intermittent,broken} denotes the state of the light. Li_st=ok means light li will light if powered, Li_st=intermittent means light li intermittently lights if powered, and Li_st=broken means light li does not work.

Figure 8.5: Belief network for the electrical domain of Figure 1.8
Example 8.17.

Consider the wiring example of Figure 1.8. Suppose we decide to have variables for whether lights are lit, for the switch positions, for whether lights and switches are faulty or not, and for whether there is power in the wires. The variables are defined in Figure 8.5.

We order the variables so that each variable has few parents. In this case there seems to be a natural causal order where, for example, the variable for whether a light is lit comes after variables for whether the light is working and whether there is power coming into the light.

Whether light l1 is lit depends only on whether there is power in wire w0 and whether light l1 is working properly. Other variables, such as the position of switch s1, whether light l2 is lit, or who is the Queen of Canada, are irrelevant. Thus, the parents of L1_lit are W0 and L1_st.

Consider variable W0, which represents whether there is power in wire w0. If we knew whether there was power in wires w1 and w2, and we knew the position of switch s2 and whether the switch was working properly, the value of the other variables (other than L1_lit) would not affect our belief in whether there is power in wire w0. Thus, the parents of W0 should be S2_Pos, S2_st, W1, and W2.

Figure 8.5 shows the resulting belief network after the independence of each variable has been considered. The belief network also contains the domains of the variables, as given in the figure, and conditional probabilities of each variable given its parents.

For the variable W1, the following conditional probabilities must be specified:

P(W1=liveS1_pos=upS1_st=okW3=live)
P(W1=liveS1_pos=upS1_st=okW3=dead)
P(W1=liveS1_pos=upS1_st=upside_downW3=live)
    
P(W1=liveS1_pos=downS1_st=brokenW3=dead).

There are two values for S1_pos, five values for S1_ok, and two values for W3, so there are 2*5*2=20 different cases where a value for the conditional probability of W1=live must be specified. As far as probability theory is concerned, the probability for W1=live for these 20 cases could be assigned arbitrarily. Of course, knowledge of the domain constrains what values make sense. The values for W1=dead can be computed from the values for W1=live for each of these cases.

Because the variable S1_st has no parents, it requires a prior distribution, which can be specified as the probabilities for all but one of the values; the remaining value is derived from the constraint that all of the probabilities sum to 1. Thus, to specify the distribution of S1_st, four of the following five probabilities must be specified:

P(S1_st=ok)
P(S1_st=upside_down)
P(S1_st=short)
P(S1_st=intermittent)
P(S1_st=broken)

The other variables are represented analogously.

Such a network is used in a number of ways:

  • By conditioning on the knowledge that the switches and circuit breakers are ok, and on the values of the outside power and the position of the switches, this network simulates how the lighting should work.

  • Given values of the outside power and the position of the switches, the network can infer the probability of any outcome, such as how likely it is that l1 is lit.

  • Given values for the switches and whether the lights are lit, the posterior probability that each switch or circuit breaker is in any particular state can be inferred.

  • Given some observations, the network may be used to determine the most likely position of switches.

  • Given some switch positions, some outputs, and some intermediate values, the network may be used to determine the probability of any other variable in the network.

Note the independence assumption embedded in this model. The DAG specifies that the lights, switches, and circuit breakers break independently. To model dependencies among how the switches break, you could add more arcs and perhaps more variables. For example, if some lights do not break independently because they come from the same batch, you could add an extra node modeling the batch, and whether it is a good batch or a bad batch, which is made a parent of the Li_st variables for each light Li from that batch. The lights now break dependently. When you have evidence that one light is broken, the probability that the batch is bad may increase and thus make it more likely that other lights from that batch are broken. If you are not sure whether the lights are indeed from the same batch, you could add variables representing this, too. The important point is that the belief network provides a specification of independence that lets us model dependencies in a natural and direct manner.

The model implies that there is no possibility of shorts in the wires or that the house is wired differently from the diagram. For example, it implies that w0 cannot be shorted to w4 so that wire w0 gets power from wire w4. You could add extra dependencies that let each possible short be modeled. An alternative is to add an extra node that indicates that the model is appropriate. Arcs from this node would lead to each variable representing power in a wire and to each light. When the model is appropriate, you could use the probabilities of Example 8.17. When the model is inappropriate, you could, for example, specify that each wire and light works at random. When there are weird observations that do not fit in with the original model – they are impossible or extremely unlikely given the model – the probability that the model is inappropriate will increase.

Belief Networks and Causality

Belief networks have often been called causal networks and provide representation of causality that takes noise and probabilities into account. Recall that a causal model predicts the result of interventions, where an intervention is an action to change the value of a variable using a mechanism outside of the model (e.g., putting a light switch up, or artificially reducing the amount of pollen).

To build a causal model of a domain given a set of random variables, create the arcs as follows. For each pair of random variables X and Y, make X a parent of Y if intervening on X (perhaps in some context of other variables) causes Y to have a different value (even probabilistically), and the effect of X on Y cannot be accounted for by having other variables Z so that X affects Z and Z affects Y. The belief network of Figure 8.5 is such a causal network. You would expect that a causal model built in this way would obey the independence assumption of the belief network. Thus, all of the conclusions of the belief network would be valid.

You would also expect such a graph to be acyclic; you do not want something eventually causing itself. This assumption is reasonable if you consider that the random variables represent particular events rather than event types. For example, consider a causal chain that “being stressed” causes you to “work inefficiently,” which, in turn, causes you to “be stressed.” To break the apparent cycle, we represent “being stressed” at different stages as different random variables that refer to different times. Being stressed in the past causes you to not work well at the moment which causes you to be stressed in the future. The variables should satisfy the clarity principle and have a well-defined meaning. The variables should not be seen as event types.

The belief network itself has nothing to say about causation, and it can represent non-causal independence, but it seems particularly appropriate for modeling causality. Adding arcs that represent local causality tends to produce a small belief network.

A causal network models interventions in the following way. If someone were to artificially force a variable to have a particular value, the variable’s descendants – but no other variables – would be affected. In Example 8.16, intervening to add or remove pollen would affect hay fever, sneezing and the sound, but not the other variables. This contrasts with observing pollen which provides evidence of the season, and so the probability of all variables would be affected by the observation.

Finally, see how the causality in belief networks relates to the causal and evidential reasoning discussed in Section 5.8. A causal belief network is a way of axiomatizing in a causal direction. Reasoning in belief networks corresponds to abducing to causes and then predicting from these.