Third edition of Artificial Intelligence: foundations of computational agents, Cambridge University Press, 2023 is now available (including the full text).

#### 6.1.3.2 Bayes' Rule

An agent must update its belief when it observes new evidence. A new piece of evidence is conjoined to the old evidence to form the complete set of evidence. Bayes' rule specifies how an agent should update its belief in a proposition based on a new piece of evidence.

Suppose an agent has a current belief in proposition *h* based on
evidence *k* already observed, given by *P(h|k)*, and subsequently
observes *e*. Its new belief in *h* is *P(h|e∧k)*. Bayes' rule
tells us how to update the agent's belief in hypothesis *h* as new evidence arrives.

**Proposition.**

*(*

**Bayes' rule**) As long as*P(e|k)≠0*,P(h | e ∧k) = P(e | h ∧k) ×P(h | k) / P(e | k).

This is often written with the background knowledge *k* implicit. In this case, if
*P(e)≠0*, then

P(h | e ) = P(e | h ) ×P(h ) / P(e ).

*P(e|h)* is the **likelihood** of the hypothesis *h*;
*P(h)* is the **prior** of the hypothesis *h*. Bayes' rule states that
the posterior is proportional to the likelihood times the prior.

**Proof.**

*The commutativity of conjunction means that*

*h∧e*is equivalent to*e ∧h*, and so they have the same probability given*k*. Using the rule for multiplication in two different ways,

P(h∧e|k) = P(h|e∧k) ×P(e|k) = P(e|h∧k)×P(h|k).

*The theorem follows from dividing the right-hand sides by P(e|k).
*

Often, Bayes' rule is used to compare various hypotheses (different
*h _{i}*'s), where it can be noticed that the denominator

*P(e | k)*is a constant that does not depend on the particular hypothesis. When comparing the relative posterior probabilities of hypotheses, we can ignore the denominator. To get the posterior probability, the denominator can be computed by reasoning by cases. If

*H*is an exclusive and covering set of propositions representing all possible hypotheses, then

P(e|k) = ∑ _{h∈H}P(e∧h| k)= ∑ _{h∈H}P(e|h∧k)×P(h|k).

Thus, the denominator of Bayes' rule is obtained by summing the numerators for all the hypotheses. When the hypothesis space is large, computing the denominator can be computationally difficult.

Generally, one of *P(e | h ∧k)* or *P(h | e
∧k)* is much easier to estimate than the other. This often
occurs when you have a causal theory of a domain, and the predictions
of different hypotheses - the *P(e | h _{i} ∧k)* for each
hypothesis

*h*- can be derived from the domain theory.

_{i}**Example 6.5:**Suppose the diagnostic assistant is interested in the diagnosis of the light switch

*s*of Figure 1.8. You would expect that the modeler is able to specify how the output of a switch is a function of the input, the switch position, and the status of the switch (whether it is working, shorted, installed upside-down, etc.). Bayes' rule lets an agent infer the status of the switch given the other information.

_{1}**Example 6.6:**Suppose an agent has information about the reliability of fire alarms. It may know how likely it is that an alarm will work if there is a fire. If it must know the probability that there is a fire, given that there is an alarm, it can use Bayes' rule:

P(fire | alarm) = P(alarm | fire) ×P(fire ) / P(alarm ),

where *P(alarm | fire)* is the
probability that the alarm worked, assuming that there was a fire. It is
a measure of the alarm's reliability. The expression
*P(fire)* is the probability of a fire
given no other information. It is a measure of how fire-prone the
building is. *P(alarm)* is the
probability of the alarm sounding, given no other information.