8.1 Probability

The third edition of Artificial Intelligence: foundations of computational agents, Cambridge University Press, 2023 is now available (including full text).

8.1.2 Axioms for Probability

The preceding section gave a semantic definition of probability. An axiomatic definition specifies axioms. These are axioms one may want for a calculus of belief, and we show they are satisfied by probability.

Suppose P is a function from propositions into real numbers that satisfies the following three axioms of probability:

Axiom 1

0P(α) for any proposition α. That is, the belief in any proposition cannot be negative.

Axiom 2

P(τ)=1 if τ is a tautology. That is, if τ is true in all possible worlds, its probability is 1.

Axiom 3

P(αβ)=P(α)+P(β) if α and β are contradictory propositions; that is, if ¬(αβ) is a tautology. In other words, if two propositions cannot both be true (they are mutually exclusive), the probability of their disjunction is the sum of their probabilities.

These axioms are meant to be intuitive properties that we would like to have of any reasonable measure of belief. If a measure of belief follows these intuitive axioms, it is covered by probability theory. Note that empirical frequencies – propositions about the proportion of examples in a data set – obey these axioms, and so follow the rules of probability, but that does not mean that all probabilities are empirical frequencies (or obtained from them).

These axioms form a sound and complete axiomatization of the meaning of probability. Soundness means that probability, as defined by the possible-worlds semantics, follows these axioms. Completeness means that any system of beliefs that obeys these axioms has a probabilistic semantics.

Proposition 8.1.

If there are a finite number of finite discrete random variables, Axioms 1, 2, and 3 are sound and complete with respect to the semantics.

It is easy to check that these axioms are true of the semantics. Conversely, the axioms can be used to compute any probability from the probability of worlds, because the descriptions of two worlds are mutually exclusive. The full proof is left as an exercise. (See Exercise 2.)

Proposition 8.2.

The following hold for all propositions α and β

  1. 1.

    Negation of a proposition:

    P(¬α)=1-P(α).
  2. 2.

    If αβ, then P(α)=P(β). That is, logically equivalent propositions have the same probability.

  3. 3.

    Reasoning by cases:

    P(α)=P(αβ)+P(α¬β).
  4. 4.

    If V is a random variable with domain D, then, for all propositions α,

    P(α)=dDP(αV=d).
  5. 5.

    Disjunction for non-exclusive propositions:

    P(αβ)=P(α)+P(β)-P(αβ).
Proof.
  1. 1.

    The propositions α¬α and ¬(α¬α) are tautologies. Therefore,
    1=P(α¬α)=P(α)+P(¬α). Rearranging gives the desired result.

  2. 2.

    If αβ, then α¬β is a tautology, so P(α¬β)=1. α and ¬β are contradictory statements, so Axiom 3 gives P(α¬β)=P(α)+P(¬β). Using part (a), P(¬β)=1-P(β). Thus, P(α)+1-P(β)=1, and so P(α)=P(β).

  3. 3.

    The proposition α((αβ)(α¬β)) and ¬((αβ)(α¬β)) are tautologies. Thus, P(α)=P((αβ)(α¬β))=P(αβ)+P(α¬β).

  4. 4.

    The proof is analogous to the proof of part (c).

  5. 5.

    (αβ)((α¬β)β) is a tautology. Thus,

    P(αβ) =P((α¬β)β)
    =P(α¬β)+P(β).

    Part (c) shows P(α¬β)=P(α)-P(αβ). Thus,

    P(αβ)=P(α)+P(β)-P(αβ).