8 Reasoning with Uncertainty

8.3 Belief Networks

The notion of conditional independence is used to give a concise representation of many domains. The idea is that, given a random variable X, there may be a few variables that directly affect the X’s value, in the sense that X is conditionally independent of other variables given these variables. The set of locally affecting variables is called the Markov blanket. This locality is exploited in a belief network.

A belief network is a directed model of conditional dependence among a set of random variables. The conditional independence in a belief network takes in an ordering of the variables, and results in a directed graph.

To define a belief network on a set of random variables, {X1,,Xn}, first select a total ordering of the variables, say, X1,,Xn. The chain rule (Proposition 8.3) shows how to decompose a conjunction into conditional probabilities:

P(X1 =v1X2=v2Xn=vn)

Or, in terms of random variables and probability distributions,

P(X1,X2,,Xn)= i=1nP(XiX1,,Xi-1).

Define the parents of random variable Xi, written parents(Xi), to be a minimal set of predecessors of Xi in the total ordering such that the other predecessors of Xi are conditionally independent of Xi given parents(Xi). Thus Xi probabilistically depends on each of its parents, but is independent of its other predecessors. That is, parents(Xi){X1,,Xi-1} such that


When there are multiple minimal sets of predecessors satisfying this condition, any minimal set may be chosen to be the parents. There can be more than one minimal set only when some of the predecessors are deterministic functions of others.

Putting the chain rule and the definition of parents together gives:

P(X1,X2,,Xn)= i=1nP(Xiparents(Xi)).

The probability over all of the variables, P(X1,X2,,Xn), is called the joint probability distribution. A belief network defines a factorization of the joint probability distribution into a product of conditional probabilities.

A belief network, also called a Bayesian network, is an acyclic directed graph (DAG), where the nodes are random variables. There is an arc from each element of parents(Xi) into Xi. Associated with the belief network is a set of conditional probability distributions that specify the conditional probability of each variable given its parents (which includes the prior probabilities of those variables with no parents).

Thus, a belief network consists of

  • a DAG, where each node is labeled by a random variable

  • a domain for each random variable, and

  • a set of conditional probability distributions giving P(Xparents(X)) for each variable X.

A belief network is acyclic by construction. How the chain rule decomposes a conjunction depends on the ordering of the variables. Different orderings can result in different belief networks. In particular, which variables are eligible to be parents depends on the ordering, as only predecessors in the ordering can be parents. Some of the orderings may result in networks with fewer arcs than other orderings.

Example 8.13.

Consider the four variables of Example 8.12, with the ordering: Intelligent, Works_hard, Answers, Grade. Consider the variables in order. Intelligent does not have any predecessors in the ordering, so it has no parents, thus parents(Intelligent)={}. Works_hard is independent of Intelligent, and so it too has no parents. Answers depends on both Intelligent and Works_hard, so


Grade is independent of Intelligent and Works_hard given Answers and so


The corresponding belief network is given in Figure 8.2.

This graph defines the decomposition of the joint distribution:

P(Inte lligent,Works_hard,Answers,Grade)
= P(Intelligent)*P(Works_hard)*P(AnswersIntelligent,Works_hard)

In the examples below, the domains of the variables are simple, for example the domain of Answers may be {insightful,clear,superficial,vacuous} or it could be the actual text answers.

Figure 8.2: Belief network for exam answering of Example 8.13

The independence of a belief network, according to the definition of parents, is that each variable is independent of all of the variables that are not descendants of the variable (its non-descendants) given the variable’s parents.