foundations of computational agents
The axioms of probability are very weak and provide few constraints on allowable conditional probabilities. For example, if there are $n$ binary variables, there are ${2}^{n}-1$ free parameters, which means there are ${2}^{n}-1$ numbers to be assigned to give an arbitrary probability distribution.
A useful way to limit the amount of information required is to assume that each variable only directly depends on a few other variables. This uses assumptions of conditional independence. Not only does it reduce how many numbers are required to specify a model, but also the independence structure may be exploited for efficient reasoning.
As long as the value of $P(h\mid e)$ is not $0$ or $1$, the value of $P(h\mid e)$ does not constrain the value of $P(h\mid f\wedge e)$. This latter probability could have any value in the range $[0,1]$. It is $1$ when $f$ implies $h$, and it is $0$ if $f$ implies $\neg h$. A common kind of qualitative knowledge is of the form $P(h\mid e)=P(h\mid f\wedge e)$, which specifies $f$ is irrelevant to the probability of $h$ given that $e$ is observed. This idea applies to random variables, as in the following definition.
Random variable $X$ is conditionally independent of random variable $Y$ given a set of random variables $Zs$ if
$$P(X\mid Y,Zs)=P(X\mid Zs)$$ |
whenever the probabilities are well defined. That is, given a value of each variable in $Zs$, knowing $Y$’s value does not affect the belief in the value of $X$.
Consider a probabilistic model of students and exams. It is reasonable to assume that the random variable $Intelligence$ is independent of $Works\mathrm{\_}hard$, given no observations. If you find that a student works hard, it does not tell you anything about their intelligence.
The answers to the exam (the variable $Answers$) would depend on whether the student is intelligent and works hard. Thus, given $Answers$, $Intelligent$ would be dependent on $Works\mathrm{\_}hard$; if you found someone had insightful answers, and did not work hard, your belief that they are intelligent would go up.
The grade on the exam (variable $Grade$) should depend on the student’s answers, not on the intelligence or whether the student worked hard. Thus, $Grade$ would be independent of $Intelligence$ given $Answers$. However, if the answers were not observed, $Intelligence$ will affect $Grade$ (because highly intelligent students would be expected to have different answers than not so intelligent students); thus, $Grade$ is dependent on $Intelligence$ given no observations.
The following four statements are equivalent, as long as the conditional probabilities are well defined:
$X$ is conditionally independent of $Y$ given $Z$.
$Y$ is conditionally independent of $X$ given $Z$.
$P(X=x\mid Y=y\wedge Z=z)=P(X=x\mid Y={y}^{\prime}\wedge Z=z)$ for all values $x$, $y$, ${y}^{\prime}$, and $z$. That is, in the context that you are given a value for $Z$, changing the value of $Y$ does not affect the belief in $X$.
$P(X,Y\mid Z)=P(X\mid Z)P(Y\mid Z)$.
The proof is left as an exercise. See Exercise 9.1.
Variables $X$ and $Y$ are unconditionally independent if $P(X,Y)=P(X)P(Y)$, that is, if they are conditionally independent given no observations. Note that $X$ and $Y$ being unconditionally independent does not imply they are conditionally independent given some other information $Z$.
Conditional independence is a useful assumption that is often natural to assess and can be exploited in inference. It is rare to have a table of probabilities of worlds and assess independence numerically.
Another useful concept is context-specific independence. Variables $X$ and $Y$ are independent with respect to context $Zs=vs$ if
$$P(X\mid Y,Zs=vs)=P(X\mid Zs=zs)$$ |
whenever the probabilities are well defined. That is, for all $x\in domain(X)$ and for all $y\in domain(Y)$, if $P(Y=y\wedge Zs=zs)>0$:
$$P(X=x\mid Y=y\wedge Zs=zs)=P(X=x\mid Zs=zs).$$ |
This is like conditional independence, but is only for one of the values of $Zs$. This is discussed in more detail when representing conditional probabilities in terms of decision trees.