# 9.2 Independence

The axioms of probability are very weak and provide few constraints on allowable conditional probabilities. For example, if there are $n$ binary variables, there are $2^{n}-1$ free parameters, which means there are $2^{n}-1$ numbers to be assigned to give an arbitrary probability distribution.

A useful way to limit the amount of information required is to assume that each variable only directly depends on a few other variables. This uses assumptions of conditional independence. Not only does it reduce how many numbers are required to specify a model, but also the independence structure may be exploited for efficient reasoning.

As long as the value of $P(h\mid e)$ is not $0$ or $1$, the value of $P(h\mid e)$ does not constrain the value of $P(h\mid f\land e)$. This latter probability could have any value in the range $[0,1]$. It is $1$ when $f$ implies $h$, and it is $0$ if $f$ implies $\neg h$. A common kind of qualitative knowledge is of the form $P(h\mid e)=P(h\mid f\land e)$, which specifies $f$ is irrelevant to the probability of $h$ given that $e$ is observed. This idea applies to random variables, as in the following definition.

Random variable $X$ is conditionally independent of random variable $Y$ given a set of random variables $Zs$ if

 $P(X\mid Y,Zs)=P(X\mid Zs)$

whenever the probabilities are well defined. That is, given a value of each variable in $Zs$, knowing $Y$’s value does not affect the belief in the value of $X$.

###### Example 9.10.

Consider a probabilistic model of students and exams. It is reasonable to assume that the random variable $Intelligence$ is independent of $Works\_hard$, given no observations. If you find that a student works hard, it does not tell you anything about their intelligence.

The answers to the exam (the variable $Answers$) would depend on whether the student is intelligent and works hard. Thus, given $Answers$, $Intelligent$ would be dependent on $Works\_hard$; if you found someone had insightful answers, and did not work hard, your belief that they are intelligent would go up.

The grade on the exam (variable $Grade$) should depend on the student’s answers, not on the intelligence or whether the student worked hard. Thus, $Grade$ would be independent of $Intelligence$ given $Answers$. However, if the answers were not observed, $Intelligence$ will affect $Grade$ (because highly intelligent students would be expected to have different answers than not so intelligent students); thus, $Grade$ is dependent on $Intelligence$ given no observations.

###### Proposition 9.2.

The following four statements are equivalent, as long as the conditional probabilities are well defined:

1. 1.

$X$ is conditionally independent of $Y$ given $Z$.

2. 2.

$Y$ is conditionally independent of $X$ given $Z$.

3. 3.

$P(X{\,{=}\,}x\mid Y{\,{=}\,}y\land Z{\,{=}\,}z)=P(X{\,{=}\,}x\mid Y{\,{=}\,}y^% {\prime}\land Z{\,{=}\,}z)$ for all values $x$, $y$, $y^{\prime}$, and $z$. That is, in the context that you are given a value for $Z$, changing the value of $Y$ does not affect the belief in $X$.

4. 4.

${P(X,Y\mid Z)=P(X\mid Z)P(Y\mid Z)}$.

The proof is left as an exercise. See Exercise 9.1.

Variables $X$ and $Y$ are unconditionally independent if $P(X,Y)=P(X)P(Y)$, that is, if they are conditionally independent given no observations. Note that $X$ and $Y$ being unconditionally independent does not imply they are conditionally independent given some other information $Z$.

Conditional independence is a useful assumption that is often natural to assess and can be exploited in inference. It is rare to have a table of probabilities of worlds and assess independence numerically.

Another useful concept is context-specific independence. Variables $X$ and $Y$ are independent with respect to context $Zs{\,{=}\,}vs$ if

 $P(X\mid Y,Zs{\,{=}\,}vs)=P(X\mid Zs{\,{=}\,}zs)$

whenever the probabilities are well defined. That is, for all $x\in domain(X)$ and for all $y\in domain(Y)$, if $P(Y\,{=}\,y\land Zs\,{=}\,zs)>0$:

 $P(X\,{=}\,x\mid Y\,{=}\,y\land Zs\,{=}\,zs)=P(X\,{=}\,x\mid Zs\,{=}\,zs).$

This is like conditional independence, but is only for one of the values of $Zs$. This is discussed in more detail when representing conditional probabilities in terms of decision trees.