Third edition of Artificial Intelligence: foundations of computational agents, Cambridge University Press, 2023 is now available (including the full text).

6.1.3.2 Bayes' Rule

An agent must update its belief when it observes new evidence. A new piece of evidence is conjoined to the old evidence to form the complete set of evidence. Bayes' rule specifies how an agent should update its belief in a proposition based on a new piece of evidence.

Suppose an agent has a current belief in proposition h based on evidence k already observed, given by P(h|k), and subsequently observes e. Its new belief in h is P(h|e∧k). Bayes' rule tells us how to update the agent's belief in hypothesis h as new evidence arrives.

Proposition. (Bayes' rule) As long as P(e|k)≠0,
P(h | e ∧k) = P(e | h ∧k) ×P(h | k) / P(e | k).

This is often written with the background knowledge k implicit. In this case, if P(e)≠0, then

P(h | e ) = P(e | h ) ×P(h ) / P(e ).

P(e|h) is the likelihood of the hypothesis h; P(h) is the prior of the hypothesis h. Bayes' rule states that the posterior is proportional to the likelihood times the prior.

Proof. The commutativity of conjunction means that h∧e is equivalent to e ∧h, and so they have the same probability given k. Using the rule for multiplication in two different ways,
P(h∧e|k) = P(h|e∧k) ×P(e|k)
= P(e|h∧k)×P(h|k).

The theorem follows from dividing the right-hand sides by P(e|k).

Often, Bayes' rule is used to compare various hypotheses (different hi's), where it can be noticed that the denominator P(e | k) is a constant that does not depend on the particular hypothesis. When comparing the relative posterior probabilities of hypotheses, we can ignore the denominator. To get the posterior probability, the denominator can be computed by reasoning by cases. If H is an exclusive and covering set of propositions representing all possible hypotheses, then

P(e|k)=h∈HP(e∧h| k)
=h∈HP(e|h∧k)×P(h|k).

Thus, the denominator of Bayes' rule is obtained by summing the numerators for all the hypotheses. When the hypothesis space is large, computing the denominator can be computationally difficult.

Generally, one of P(e | h ∧k) or P(h | e ∧k) is much easier to estimate than the other. This often occurs when you have a causal theory of a domain, and the predictions of different hypotheses - the P(e | hi ∧k) for each hypothesis hi - can be derived from the domain theory.

Example 6.5: Suppose the diagnostic assistant is interested in the diagnosis of the light switch s1 of Figure 1.8. You would expect that the modeler is able to specify how the output of a switch is a function of the input, the switch position, and the status of the switch (whether it is working, shorted, installed upside-down, etc.). Bayes' rule lets an agent infer the status of the switch given the other information.
Example 6.6: Suppose an agent has information about the reliability of fire alarms. It may know how likely it is that an alarm will work if there is a fire. If it must know the probability that there is a fire, given that there is an alarm, it can use Bayes' rule:
P(fire | alarm) = P(alarm | fire) ×P(fire ) / P(alarm ),

where P(alarm | fire) is the probability that the alarm worked, assuming that there was a fire. It is a measure of the alarm's reliability. The expression P(fire) is the probability of a fire given no other information. It is a measure of how fire-prone the building is. P(alarm) is the probability of the alarm sounding, given no other information.