9 Reasoning with Uncertainty 9.1 Probability 9.3 Belief Networks

9.2 Independence

The axioms of probability are very weak and provide few constraints on allowable conditional probabilities. For example, if there are $n$ binary variables, there are $2^{n}-1$ free parameters, which means there are $2^{n}-1$ numbers to be assigned to give an arbitrary probability distribution.

A useful way to limit the amount of information required is to assume that each variable only directly depends on a few other variables. This uses assumptions of conditional independence. Not only does it reduce how many numbers are required to specify a model, but also the independence structure may be exploited for efficient reasoning.

As long as the value of $P(h\mid e)$ is not $0$ or $1$ , the value of $P(h\mid e)$ does not constrain the value of $P(h\mid f\land e)$ . This latter probability could have any value in the range $[0,1]$ . It is $1$ when $f$ implies $h$ , and it is $0$ if $f$ implies $\neg h$ . A common kind of qualitative knowledge is of the form $P(h\mid e)=P(h\mid f\land e)$ , which specifies $f$ is irrelevant to the probability of $h$ given that $e$ is observed. This idea applies to random variables, as in the following definition.

Random variable $X$ is conditionally independent of random variable $Y$ given a set of random variables $Z s$ if

P(X\mid Y,Zs)=P(X\mid Zs)

whenever the probabilities are well defined. That is, given a value of each variable in $Z s$ , knowing $Y$ ’s value does not affect the belief in the value of $X$ .

Example 9.10.

Consider a probabilistic model of students and exams. It is reasonable to assume that the random variable $I n t e l l i g e n c e$ is independent of $Works\_hard$ , given no observations. If you find that a student works hard, it does not tell you anything about their intelligence.

The answers to the exam (the variable $A n s w e r s$ ) would depend on whether the student is intelligent and works hard. Thus, given $A n s w e r s$ , $I n t e l l i g e n t$ would be dependent on $Works\_hard$ ; if you found someone had insightful answers, and did not work hard, your belief that they are intelligent would go up.

The grade on the exam (variable $G r a d e$ ) should depend on the student’s answers, not on the intelligence or whether the student worked hard. Thus, $G r a d e$ would be independent of $I n t e l l i g e n c e$ given $A n s w e r s$ . However, if the answers were not observed, $I n t e l l i g e n c e$ will affect $G r a d e$ (because highly intelligent students would be expected to have different answers than not so intelligent students); thus, $G r a d e$ is dependent on $I n t e l l i g e n c e$ given no observations.

Proposition 9.2.

The following four statements are equivalent, as long as the conditional probabilities are well defined:

1.

$X$ is conditionally independent of $Y$ given $Z$ .
2.

$Y$ is conditionally independent of $X$ given $Z$ .
3.

$P(X{\,{=}\,}x\mid Y{\,{=}\,}y\land Z{\,{=}\,}z)=P(X{\,{=}\,}x\mid Y{\,{=}\,}y^% {\prime}\land Z{\,{=}\,}z)$ for all values $x$ , $y$ , $y^{\prime}$ , and $z$ . That is, in the context that you are given a value for $Z$ , changing the value of $Y$ does not affect the belief in $X$ .
4.

${P(X,Y\mid Z)=P(X\mid Z)P(Y\mid Z)}$ .

The proof is left as an exercise. See Exercise 9.1.

Variables $X$ and $Y$ are unconditionally independent if $P(X,Y)=P(X)P(Y)$ , that is, if they are conditionally independent given no observations. Note that $X$ and $Y$ being unconditionally independent does not imply they are conditionally independent given some other information $Z$ .

Conditional independence is a useful assumption that is often natural to assess and can be exploited in inference. It is rare to have a table of probabilities of worlds and assess independence numerically.

Another useful concept is context-specific independence. Variables $X$ and $Y$ are independent with respect to context $Zs{\,{=}\,}vs$ if

P(X\mid Y,Zs{\,{=}\,}vs)=P(X\mid Zs{\,{=}\,}zs)

whenever the probabilities are well defined. That is, for all $x\in domain(X)$ and for all $y\in domain(Y)$ , if $P(Y\,{=}\,y\land Zs\,{=}\,zs)>0$ :

P(X\,{=}\,x\mid Y\,{=}\,y\land Zs\,{=}\,zs)=P(X\,{=}\,x\mid Zs\,{=}\,zs).

This is like conditional independence, but is only for one of the values of $Z s$ . This is discussed in more detail when representing conditional probabilities in terms of decision trees.

Artificial Intelligence 3E

9.2 Independence

Example 9.10.

Proposition 9.2.

Artificial
Intelligence 3E