Third edition of Artificial Intelligence: foundations of computational agents, Cambridge University Press, 2023 is now available (including the full text).

6.1.2 Axioms for Probability

The preceding section gave a semantic definition of probability. We can also give an axiomatic definition of probability that specifies axioms of what properties we may want in a calculus of belief. Suppose P is a function from propositions into real numbers that satisfies the following three axioms of probability:

Axiom 1
0 ≤ P(α) for any proposition α. That is, the belief in any proposition cannot be negative.
Axiom 2
P(τ) = 1 if τ is a tautology. That is, if τ is true in all possible worlds, its probability is 1.
Axiom 3
P(α∨ β)=P(α)+P(β) if α and β are contradictory propositions; that is, if ¬(α∧β) is a tautology. In other words, if two propositions cannot both be true (they are mutually exclusive), the probability of their disjunction is the sum of their probabilities.

These axioms are meant to be intuitive properties that we would like to have of any reasonable measure of belief. If a measure of belief follows these intuitive axioms, it is covered by probability theory, whether or not the measure is derived from actual frequency counts. These axioms form a sound and complete axiomatization of the meaning of probability. Soundness means that probability, as defined by the possible-worlds semantics, follows these axioms. Completeness means that any system of beliefs that obeys these axioms has a probabilistic semantics.

Proposition 6.1: If there are a finite number of finite discrete random variables, Axioms 1, 2, and 3 are sound and complete with respect to the semantics.

It is easy to check that these axioms are true of the semantics. In the other way around, you can use the axioms to compute any probability from the probability of worlds, because the descriptions of two worlds are mutually exclusive. The full proof is left as an exercise.

Proposition 6.2: The following hold for all propositions α and β:
  1. Negation of a proposition:
    P(¬α)=1-P(α).
  2. If α↔β, then P(α)=P(β). That is, logically equivalent propositions have the same probability.
  3. Reasoning by cases:
    P(α)=P(α∧β)+P(α∧¬β).
  4. If V is a random variable with domain D, then, for all propositions α,
    P(α)=∑d∈DP(α∧V=d).
  5. Disjunction for non-exclusive propositions:
    P(α∨ β)=P(α)+P(β)-P(α∧β).
Proof.
  1. The propositions α∨ ¬α and ¬(α∧¬α) are tautologies. Therefore, 1=P(α∨ ¬α) =P(α)+P( ¬α). Rearranging gives the desired result.
  2. If α↔β, then α∨ ¬β is a tautology, so P(α∨ ¬β)=1. α and ¬β are contradictory statements, so we can use Axiom 3 to give P(α∨ ¬β)=P(α)+P(¬β). Using part (a), P(¬β)=1-P(β). Thus, P(α)+1-P(β)=1, and so P(α)=P(β).
  3. The proposition α↔((α∧β)∨ (α∧¬β)) and ¬((α∧β)∧(α∧¬β)) are tautologies. Thus, P(α)=P((α∧β)∨ (α∧¬β))=P(α∧β)+P(α∧¬β).
  4. The proof is analogous to the proof of part (c).
  5. (α∨ β) ↔((α∧¬β)∨ β) is a tautology. Thus,
    P(α∨ β)=P((α∧¬β)∨ β)
    =P(α∧¬β)+P(β).

    Part (c) shows P(α∧¬β)= P(α) -P(α∧β). Thus,

    P(α∨ β)=P(α)-P(α∧β)+P(β).