# 8.3 Belief Networks

The notion of conditional independence is used to give a concise representation of many domains. The idea is that, given a random variable $X$, there may be a few variables that directly affect the $X$’s value, in the sense that $X$ is conditionally independent of other variables given these variables. The set of locally affecting variables is called the Markov blanket. This locality is exploited in a belief network.

A belief network is a directed model of conditional dependence among a set of random variables. The conditional independence in a belief network takes in an ordering of the variables, and results in a directed graph.

To define a belief network on a set of random variables, $\{X_{1},\ldots,X_{n}\}$, first select a total ordering of the variables, say, $X_{1},\ldots,X_{n}$. The chain rule (Proposition 8.3) shows how to decompose a conjunction into conditional probabilities:

 $\displaystyle P(X_{1}$ $\displaystyle\,{=}\,v_{1}\wedge X_{2}\,{=}\,v_{2}\wedge\cdots\wedge X_{n}\,{=}% \,v_{n})$ $\displaystyle\mbox{}=\prod_{i=1}^{n}P(X_{i}\,{=}\,v_{i}\mid X_{1}\,{=}\,v_{1}% \wedge\cdots\wedge X_{i-1}\,{=}\,v_{i-1}).$

Or, in terms of random variables and probability distributions,

 $\displaystyle P(X_{1},X_{2},\dots,X_{n})=$ $\displaystyle\prod_{i=1}^{n}P(X_{i}\mid X_{1},\dots,X_{i-1}).$

Define the parents of random variable $X_{i}$, written $parents(X_{i})$, to be a minimal set of predecessors of $X_{i}$ in the total ordering such that the other predecessors of $X_{i}$ are conditionally independent of $X_{i}$ given $parents(X_{i})$. Thus $X_{i}$ probabilistically depends on each of its parents, but is independent of its other predecessors. That is, $parents(X_{i})\subseteq\{X_{1},\ldots,X_{i-1}\}$ such that

 $P(X_{i}\mid X_{1},\ldots,X_{i-1})=P(X_{i}\mid parents(X_{i})).$

When there are multiple minimal sets of predecessors satisfying this condition, any minimal set may be chosen to be the parents. There can be more than one minimal set only when some of the predecessors are deterministic functions of others.

Putting the chain rule and the definition of parents together gives:

 $\displaystyle P(X_{1},X_{2},\dots,X_{n})=$ $\displaystyle\prod_{i=1}^{n}P(X_{i}\mid parents(X_{i})).$

The probability over all of the variables, $P(X_{1},X_{2},\dots,X_{n})$, is called the joint probability distribution. A belief network defines a factorization of the joint probability distribution into a product of conditional probabilities.

A belief network, also called a Bayesian network, is an acyclic directed graph (DAG), where the nodes are random variables. There is an arc from each element of $parents(X_{i})$ into $X_{i}$. Associated with the belief network is a set of conditional probability distributions that specify the conditional probability of each variable given its parents (which includes the prior probabilities of those variables with no parents).

Thus, a belief network consists of

• a DAG, where each node is labeled by a random variable

• a domain for each random variable, and

• a set of conditional probability distributions giving $P(X\mid parents(X))$ for each variable $X$.

A belief network is acyclic by construction. How the chain rule decomposes a conjunction depends on the ordering of the variables. Different orderings can result in different belief networks. In particular, which variables are eligible to be parents depends on the ordering, as only predecessors in the ordering can be parents. Some of the orderings may result in networks with fewer arcs than other orderings.

###### Example 8.13.

Consider the four variables of Example 8.12, with the ordering: $Intelligent$, $Works\_hard$, $Answers$, $Grade$. Consider the variables in order. $Intelligent$ does not have any predecessors in the ordering, so it has no parents, thus $parents(Intelligent)=\{\}$. $Works\_hard$ is independent of $Intelligent$, and so it too has no parents. $Answers$ depends on both $Intelligent$ and $Works\_hard$, so

 $parents(Answers)=\{Intelligent,Works\_hard\}.$

$Grade$ is independent of $Intelligent$ and $Works\_hard$ given $Answers$ and so

 $parents(Grade)=\{Answers\}.$

The corresponding belief network is given in Figure 8.2.

This graph defines the decomposition of the joint distribution:

 $\displaystyle P(Inte$ $\displaystyle lligent,Works\_hard,Answers,Grade)$ $\displaystyle\mbox{}=$ $\displaystyle P(Intelligent)*P(Works\_hard)*P(Answers\mid Intelligent,Works\_hard)$ $\displaystyle*P(Grade\mid Answers)$

In the examples below, the domains of the variables are simple, for example the domain of $Answers$ may be $\{insightful,clear,superficial,vacuous\}$ or it could be the actual text answers.