foundations of computational agents
The axioms of probability are very weak and provide few constraints on allowable conditional probabilities. For example, if there are $n$ binary variables, there are ${2}^{n}-1$ numbers to be assigned to give a complete probability distribution from which arbitrary conditional probabilities can be derived. To determine any probability, you may have to start with an enormous database of probabilities.
A useful way to limit the amount of information required is to assume that each variable only directly depends on a few other variables. This uses assumptions of conditional independence. Not only does it reduce how many numbers are requires to specify a model, but also the independence structure may be exploited for efficient reasoning.
As long as the value of $P(h\mid e)$ is not $0$ or $1$, the value of $P(h\mid e)$ does not constrain the value of $P(h\mid f\wedge e)$. This latter probability could have any value in the range $[0,1]$. It is $1$ when $f$ implies $h$, and it is $0$ if $f$ implies $\mathrm{\neg}h$. A common kind of qualitative knowledge is of the form $P(h\mid e)=P(h\mid f\wedge e)$, which specifies $f$ is irrelevant to the probability of $h$ given that $e$ is observed. This idea applies to random variables, as in the following definition.
Random variable $X$ is conditionally independent of random variable $Y$ given a set of random variables $Zs$ if
$$P(X\mid Y,Zs)=P(X\mid Zs)$$ |
whenever the probabilities are well defined. This means that for all $x\in domain(X)$, for all $y\in domain(Y)$, and for all $z\in domain(Zs)$, if $P(Y=y\wedge Zs=z)>0$,
$$P(X=x\mid Y=y\wedge Zs=z)=P(X=x\mid Zs=z).$$ |
That is, given a value of each variable in $Zs$, knowing $Y$’s value does not affect the belief in the value of $X$.
Consider a probabilistic model of students and exams. It is reasonable to assume that the random variable ${I}{\mathit{}}{n}{\mathit{}}{t}{\mathit{}}{e}{\mathit{}}{l}{\mathit{}}{l}{\mathit{}}{i}{\mathit{}}{g}{\mathit{}}{e}{\mathit{}}{n}{\mathit{}}{c}{\mathit{}}{e}$ is independent of ${W}{\mathit{}}{o}{\mathit{}}{r}{\mathit{}}{k}{\mathit{}}{s}{\mathit{}}{\mathrm{\_}}{\mathit{}}{h}{\mathit{}}{a}{\mathit{}}{r}{\mathit{}}{d}$, given no observations. If you find that a student works hard, it does not tell you anything about their intelligence.
The answers to the exam (the variable ${A}{\mathit{}}{n}{\mathit{}}{s}{\mathit{}}{w}{\mathit{}}{e}{\mathit{}}{r}{\mathit{}}{s}$) would depend on whether the student is intelligent and works hard. Thus, given ${A}{\mathit{}}{n}{\mathit{}}{s}{\mathit{}}{w}{\mathit{}}{e}{\mathit{}}{r}{\mathit{}}{s}$, ${I}{\mathit{}}{n}{\mathit{}}{t}{\mathit{}}{e}{\mathit{}}{l}{\mathit{}}{l}{\mathit{}}{i}{\mathit{}}{g}{\mathit{}}{e}{\mathit{}}{n}{\mathit{}}{t}$ would be dependent on ${W}{\mathit{}}{o}{\mathit{}}{r}{\mathit{}}{k}{\mathit{}}{s}{\mathit{}}{\mathrm{\_}}{\mathit{}}{h}{\mathit{}}{a}{\mathit{}}{r}{\mathit{}}{d}$; if you found someone had insightful answers, and did not work hard, your belief that they are intelligent would go up.
The grade on the exam (variable ${G}{\mathit{}}{r}{\mathit{}}{a}{\mathit{}}{d}{\mathit{}}{e}$) should depend on the student’s answers, not on the intelligence or whether the student worked hard. Thus ${G}{\mathit{}}{r}{\mathit{}}{a}{\mathit{}}{d}{\mathit{}}{e}$ would be independent of ${I}{\mathit{}}{n}{\mathit{}}{t}{\mathit{}}{e}{\mathit{}}{l}{\mathit{}}{l}{\mathit{}}{i}{\mathit{}}{g}{\mathit{}}{e}{\mathit{}}{n}{\mathit{}}{c}{\mathit{}}{e}$ given ${A}{\mathit{}}{n}{\mathit{}}{s}{\mathit{}}{w}{\mathit{}}{e}{\mathit{}}{r}{\mathit{}}{s}$. However, if the answers were not observed, ${I}{\mathit{}}{n}{\mathit{}}{t}{\mathit{}}{e}{\mathit{}}{l}{\mathit{}}{l}{\mathit{}}{i}{\mathit{}}{g}{\mathit{}}{e}{\mathit{}}{n}{\mathit{}}{c}{\mathit{}}{e}$ will affect ${G}{\mathit{}}{r}{\mathit{}}{a}{\mathit{}}{d}{\mathit{}}{e}$ (because highly intelligent students would be expected to have different answers than not so intelligent students); thus ${G}{\mathit{}}{r}{\mathit{}}{a}{\mathit{}}{d}{\mathit{}}{e}$ is dependent on ${I}{\mathit{}}{n}{\mathit{}}{t}{\mathit{}}{e}{\mathit{}}{l}{\mathit{}}{l}{\mathit{}}{i}{\mathit{}}{g}{\mathit{}}{e}{\mathit{}}{n}{\mathit{}}{c}{\mathit{}}{e}$ given no observations.
The following four statements are equivalent, as long as the conditional probabilities are well defined
$X$ is conditionally independent of $Y$ given $Z$.
$Y$ is conditionally independent of $X$ given $Z$.
$P(X=x\mid Y=y\wedge Z=z)=P(X=x\mid Y={y}^{\prime}\wedge Z=z)$ for all values $x$, $y$, ${y}^{\prime}$ and $z$. That is, in the context that you are given a value for $Z$, changing the value of $Y$ does not affect the belief in $X$.
$P(X,Y\mid Z)=P(X\mid Z)P(Y\mid Z)$.
The proof is left as an exercise. See Exercise 3.
Variables $X$ and $Y$ are unconditionally independent if $P(X,Y)=P(X)P(Y)$, that is, if they are conditionally independent given no observations. Note that $X$ and $Y$ being unconditionally independent does not imply they are conditionally independent given some other information $Z$.
Conditional independence is a useful assumption that is often natural to assess and can be exploited in inference. It is very rare that we would have a table of probabilities of worlds and assess independence numerically.