9.2 Independence

The axioms of probability are very weak and provide few constraints on allowable conditional probabilities. For example, if there are n binary variables, there are 2n1 free parameters, which means there are 2n1 numbers to be assigned to give an arbitrary probability distribution.

A useful way to limit the amount of information required is to assume that each variable only directly depends on a few other variables. This uses assumptions of conditional independence. Not only does it reduce how many numbers are required to specify a model, but also the independence structure may be exploited for efficient reasoning.

As long as the value of P(he) is not 0 or 1, the value of P(he) does not constrain the value of P(hfe). This latter probability could have any value in the range [0,1]. It is 1 when f implies h, and it is 0 if f implies ¬h. A common kind of qualitative knowledge is of the form P(he)=P(hfe), which specifies f is irrelevant to the probability of h given that e is observed. This idea applies to random variables, as in the following definition.

Random variable X is conditionally independent of random variable Y given a set of random variables Zs if

P(XY,Zs)=P(XZs)

whenever the probabilities are well defined. That is, given a value of each variable in Zs, knowing Y’s value does not affect the belief in the value of X.

Example 9.10.

Consider a probabilistic model of students and exams. It is reasonable to assume that the random variable Intelligence is independent of Works_hard, given no observations. If you find that a student works hard, it does not tell you anything about their intelligence.

The answers to the exam (the variable Answers) would depend on whether the student is intelligent and works hard. Thus, given Answers, Intelligent would be dependent on Works_hard; if you found someone had insightful answers, and did not work hard, your belief that they are intelligent would go up.

The grade on the exam (variable Grade) should depend on the student’s answers, not on the intelligence or whether the student worked hard. Thus, Grade would be independent of Intelligence given Answers. However, if the answers were not observed, Intelligence will affect Grade (because highly intelligent students would be expected to have different answers than not so intelligent students); thus, Grade is dependent on Intelligence given no observations.

Proposition 9.2.

The following four statements are equivalent, as long as the conditional probabilities are well defined:

  1. 1.

    X is conditionally independent of Y given Z.

  2. 2.

    Y is conditionally independent of X given Z.

  3. 3.

    P(X=xY=yZ=z)=P(X=xY=yZ=z) for all values x, y, y, and z. That is, in the context that you are given a value for Z, changing the value of Y does not affect the belief in X.

  4. 4.

    P(X,YZ)=P(XZ)P(YZ).

The proof is left as an exercise. See Exercise 9.1.

Variables X and Y are unconditionally independent if P(X,Y)=P(X)P(Y), that is, if they are conditionally independent given no observations. Note that X and Y being unconditionally independent does not imply they are conditionally independent given some other information Z.

Conditional independence is a useful assumption that is often natural to assess and can be exploited in inference. It is rare to have a table of probabilities of worlds and assess independence numerically.

Another useful concept is context-specific independence. Variables X and Y are independent with respect to context Zs=vs if

P(XY,Zs=vs)=P(XZs=zs)

whenever the probabilities are well defined. That is, for all xdomain(X) and for all ydomain(Y), if P(Y=yZs=zs)>0:

P(X=xY=yZs=zs)=P(X=xZs=zs).

This is like conditional independence, but is only for one of the values of Zs. This is discussed in more detail when representing conditional probabilities in terms of decision trees.