6.2 Independence

The axioms of probability are very weak and provide few constraints on allowable conditional probabilities. For example, if there are n binary variables, there are 2n-1 numbers to be assigned to give a complete probability distribution from which arbitrary conditional probabilities can be derived. To determine any probability, you may have to start with an enormous database of conditional probabilities or of probabilities of possible worlds.

Two main approaches are used to overcome the need for so many numbers:

Assume that the knowledge of the truth of one proposition, Y, does not affect the agent's belief in another proposition, X, in the context of other propositions Z. We say that X is independent of Y given Z. This is defined below.
Maximum entropy or random worlds
Given no other knowledge, assume that everything is as random as possible. That is, the probabilities are distributed as uniformly as possible consistent with the available information.

We consider here in detail the first of these (but see the box).

Reducing the Numbers

The distinction between allowing representations of independence and using maximum entropy or random worlds highlights an important difference between views of a knowledge representation:

  • The first view is that a knowledge representation provides a high-level modeling language that lets us model a domain in a reasonably natural way. According to this view, it is expected that knowledge representation designers prescribe how to use the knowledge representation language. It is expected that they provide a user manual on how to describe domains of interest.
  • The second view is that a knowledge representation should allow someone to add whatever knowledge they may have about a domain. The knowledge representation should fill in the rest in a commonsense manner. According to this view, it is unreasonable for a knowledge representation designer to specify how particular knowledge should be encoded.

Judging a knowledge representation by the wrong criteria does not result in a fair assessment.

A belief network is a representation for a particular independence among variables. Belief networks should be viewed as a modeling language. Many domains can be concisely and naturally represented by exploiting the independencies that belief networks compactly represent. This does not mean that we can just throw in lots of facts (or probabilities) and expect a reasonable answer. One must think about a domain and consider exactly what variables are involved and what the dependencies are among the variables. When judged by this criterion, belief networks form a useful representation scheme.

Once the network structure and the domains of the variables for a belief network are defined, exactly which numbers are required (the conditional probabilities) are prescribed. The user cannot simply add arbitrary conditional probabilities but must follow the network's structure. If the numbers that are required of a belief network are provided and are locally consistent, the whole network will be consistent. In contrast, the maximum entropy or random worlds approaches infer the most random worlds that are consistent with a probabilistic knowledge base. They form a probabilistic knowledge representation of the second type. For the random worlds approach, any numbers that happen to be available can be added and used. However, if you allow someone to add arbitrary probabilities, it is easy for the knowledge to be inconsistent with the axioms of probability. Moreover, it is difficult to justify an answer as correct if the assumptions are not made explicit.

As long as the value of P(h|e) is not 0 or 1, the value of P(h|e) does not constrain the value of P(h|f ∧e). This latter probability could have any value in the range [0,1]: It is 1 when f implies h, and it is 0 if f implies ¬h.

As far as probability theory is concerned, there is no reason why the name of the Queen of Canada should not be as significant as a light switch's position in determining whether the light is on. Knowledge of the domain, however, may tell us that it is irrelevant.

In this section we present a representation that allows us to model the structure of the world where relevant propositions are local and where not-directly-relevant variables can be ignored when probabilities are specified. This structure can be exploited for efficient reasoning.

A common kind of qualitative knowledge is of the form P(h|e)=P(h|f ∧e). This equality says that f is irrelevant to the probability of h given e. For example, the fact that Elizabeth is the Queen of Canada is irrelevant to the probability that w2 is live given that switch s1 is down. This idea can apply to random variables, as in the following definition:

Random variable X is conditionally independent of random variable Y given random variable Z if for all x ∈dom(X), for all y∈dom(Y), for all y'∈dom(Y), and for all z∈dom(Z), such that P(Y=y ∧Z=z)>0 and P(Y=y' ∧Z=z)>0,

P(X=x|Y=y ∧Z=z) = P(X=x|Y=y' ∧Z=z).

That is, given a value of Z, knowing Y's value does not affect your belief in the value of X.

Proposition 6.4: The following four statements are equivalent, as long as the conditional probabilities are well defined:
  1. X is conditionally independent of Y given Z.
  2. Y is conditionally independent of X given Z.
  3. P(X|Y, Z) = P(X|Z). That is, in the context that you are given a value for Z, if you were given a value for Y, you would have the same belief in X as if you were not given a value for Y.
  4. P(X, Y | Z) = P(X |Z) P(Y|Z).

The proof is left as an exercise. [See Exercise 6.1.]

Variables X and Y are unconditionally independent if P(X, Y)=P(X)P(Y), that is, if they are conditionally independent given no observations. Note that X and Y being unconditionally independent does not imply they are conditionally independent given some other information Z.

Conditional independence is a useful assumption about a domain that is often natural to assess and can be exploited to give a useful representation.