foundations of computational agents
Probability is a measure of belief. Beliefs need to be updated when new evidence is observed.
The measure of belief in proposition given proposition is called the conditional probability of given , written .
A proposition representing the conjunction of all of the agent’s observations of the world is called evidence. Given evidence , the conditional probability is the agent’s posterior probability of . The probability is the prior probability of and is the same as because it is the probability before the agent has observed anything.
The evidence used for the posterior probability is everything the agent observes about a particular situation. Everything observed, and not just a few select observations, must be conditioned on to obtain the correct posterior probability.
For the diagnostic assistant, the prior probability distribution over possible diseases is used before the diagnostic agent finds out about the particular patient. Evidence is obtained through discussions with the patient, observing symptoms, and the results of lab tests. Essentially any information that the diagnostic assistant finds out about the patient is evidence. The assistant updates its probability to reflect the new evidence in order to make informed decisions.
The information that the delivery robot receives from its sensors is its evidence. When sensors are noisy, the evidence is what is known, such as the particular pattern received by the sensor, not that there is a person in front of the robot. The robot could be mistaken about what is in the world but it knows what information it received.
Evidence , where is a proposition, will rule out all possible worlds that are incompatible with . Like the definition of logical consequence, the given proposition selects the possible worlds in which is true. As in the definition of probability, we first define the conditional probability over worlds, and then use this to define a probability over propositions.
Evidence induces a new probability of world given . Any world where is false has conditional probability , and the remaining worlds are normalized so that the probabilities of the worlds sum to :
where is a constant (that depends on ) that ensures the posterior probability of all worlds sums to 1.
For to be a probability measure over worlds for each :
Therefore, . Thus, the conditional probability is only defined if . This is reasonable, as if , is impossible.
The conditional probability of proposition given evidence is the sum of the conditional probabilities of the possible worlds in which is true. That is,
The last form above is typically given as the definition of conditional probability. Here we have derived it as a consequence of a more basic definition.
A conditional probability distribution, written where and are variables or sets of variables, is a function of the variables: given a value for and a value for , it gives the value , where the latter is the conditional probability of the propositions.
The definition of conditional probability allows the decomposition of a conjunction into a product of conditional probabilities:
(Chain rule) For any propositions :
where the right-hand side is assumed to be zero if any of the products are zero (even if some of them are undefined).
Note that any theorem about unconditional probabilities can be converted into a theorem about conditional probabilities by adding the same evidence to each probability. This is because the conditional probability measure is a probability measure. For example, case (e) of Proposition 8.2 implies .
An agent using probability updates its belief when it observes new evidence. A new piece of evidence is conjoined to the old evidence to form the complete set of evidence. Bayes’ rule specifies how an agent should update its belief in a proposition based on a new piece of evidence.
Suppose an agent has a current belief in proposition based on evidence already observed, given by , and subsequently observes . Its new belief in is . Bayes’ rule tells us how to update the agent’s belief in hypothesis as new evidence arrives.
(Bayes’ rule) As long as ,
This is often written with the background knowledge implicit. In this case, if , then
is the likelihood and is the prior probability of the hypothesis . Bayes’ rule states that the posterior probability is proportional to the likelihood times the prior.
The commutativity of conjunction means that is equivalent to , and so they have the same probability given . Using the rule for multiplication in two different ways,
The theorem follows from dividing the right-hand sides by , which is not 0 by assumption. ∎
Often, Bayes’ rule is used to compare various hypotheses (different s). The denominator is a constant that does not depend on the particular hypothesis, and so when comparing the relative posterior probabilities of hypotheses, the denominator can be ignored.
To derive the posterior probability, the denominator may be computed by reasoning by cases. If is an exclusive and covering set of propositions representing all possible hypotheses, then
Thus, the denominator of Bayes’ rule is obtained by summing the numerators for all the hypotheses. When the hypothesis space is large, computing the denominator is computationally difficult.
Generally, one of or is much easier to estimate than the other. Bayes’ rule is used to compute one from the other.
In medical diagnosis, the doctor observes a patient’s symptoms, and would like to know the likely diseases. Thus the doctor would like . This is difficult to assess as it depends on the context (e.g., some diseases are more prevalent in hospitals). It is typically more easy to assess as how the disease gives rise to the symptoms is typically less context dependent. These two are related by Bayes’ rule, where the prior probability of the disease, , reflects the context.
The diagnostic assistant may need to know whether the light switch of Figure 1.8 is broken or not. You would expect that the electrician who installed the light switch in the past would not know if it is broken now, but would be able to specify how the output of a switch is a function of whether there is power coming into the switch, the switch position, and the status of the switch (whether it is working, shorted, installed upside-down, etc.). The prior probability for the switch being broken depends on the maker of the switch and how old it is. Bayes’ rule lets an agent infer the status of the switch given the prior and the evidence.
Suppose an agent has information about the reliability of fire alarms. It may know how likely it is that an alarm will work if there is a fire. To determine the probability that there is a fire, given that there is an alarm, Bayes’ rule gives:
where is the probability that the alarm worked, assuming that there was a fire. It is a measure of the alarm’s reliability. The expression is the probability of a fire given no other information. It is a measure of how fire-prone the building is. is the probability of the alarm sounding, given no other information. is more difficult to directly represent because it depends, for example, on how much vandalism there is in the neighborhood.