foundations of computational agents
Using only the axioms of probability and the definition of conditional independence, prove Proposition 9.2.
Consider the belief network of Figure 9.37. This is the “Simple diagnostic example” in AIPython (aipython.org). For each of the following, first predict the answer based on your intuition, then run the belief network to check it. Explain the result you found by carrying out the inference.
The posterior probabilities of which variables change when $Smokes$ is observed to be true? That is, for which $X$ is $P(X\mid Smoke=true)\ne P(X)$.
Starting from the original network, the posterior probabilities of which variables change when $Fever$ is observed to be true? That is, specify the $X$ where $P(X\mid Fever=true)\ne P(X)$.
Does the probability of $Fever$ change when
$Wheezing$ is observed to be true?
That is, is
$P(Fever\mid Wheezing=true)\ne P(Fever)$? Explain why (in terms of the
domain, in language that could
be understood by someone who did not know about belief networks).
Suppose $Wheezing$ is observed to be true. Does the observing $Fever$ change the probability of $Smokes$? That is, is $P(Smokes\mid Wheezing)\ne P(Smokes\mid Wheezing,Fever)$? Explain why (in terms that could be understood by someone who did not know about belief networks).
What could be observed so that subsequently observing $Wheezing$ does not change the probability of $SoreThroat$. That is, specify a variable or variables $X$ such that $P(SoreThroat\mid X)=P(SoreThroat\mid X,Wheezing)$, or state that there are none. Explain why.
Suppose $Allergies$ could be another explanation of $SoreThroat$. Change the network so that $Allergies$ also affects $SoreThroat$ but is independent of the other variables in the network. Give reasonable probabilities.
What could be observed so that observing $Wheezing$ changes the probability of $Allergies$? Explain why.
What could be observed so that observing $Smokes$ changes the probability of $Allergies$? Explain why.
Note that parts (a), (b), and (c) only involve observing a single variable.
Consider the belief network of Figure 9.38, which extends the electrical domain to include an overhead projector.
Answer the following questions about how knowledge of the values of some variables would affect the probability of another variable.
Can knowledge of the value of $Projector\mathrm{\_}plugged\mathrm{\_}in$ affect the belief in the value of $Sam\mathrm{\_}reading\mathrm{\_}book$? Explain.
Can knowledge of $Screen\mathrm{\_}lit\mathrm{\_}up$ affect the belief in $Sam\mathrm{\_}reading\mathrm{\_}book$? Explain.
Can knowledge of $Projector\mathrm{\_}plugged\mathrm{\_}in$ affect your belief in $Sam\mathrm{\_}reading\mathrm{\_}book$ given that you have observed a value for $Screen\mathrm{\_}lit\mathrm{\_}up$? Explain.
Which variables could have their probabilities changed if just $Lamp\mathrm{\_}works$ was observed?
If just Power_in_projector was observed, which variables could have their probabilities changed?
Kahneman [2011, p. 166] gives the following example.
A cab was involved in a hit-and-run accident at night. Two cab companies, Green and Blue, operate in the city. You are given the following data:
85% of the cabs in the city are Green and 15% are Blue.
A witness identified the cab as Blue. The court tested the reliability of the witness in the circumstances that existed on the night of the accident and concluded that the witness correctly identifies each one of the two colors 80% of the time and fails 20% of the time.
What is the probability that the cab involved in the accident was Blue?
Represent this story as a belief network. Explain all variables and conditional probabilities. What is observed, what is the answer?
Suppose there were three independent witnesses, two of whom claimed the cab was Blue and one of whom claimed the cab was Green. Show the corresponding belief network. What is the probability the cab was Blue? What if all three claimed the cab was Blue?
Suppose it was found that the two witnesses who claimed the cab was Blue were not independent, but there was a 60% chance they colluded. (What might this mean?) Show the corresponding belief network, and the relevant probabilities. What is the probability that the cab is Blue (both for the case where all three witnesses claim that the cab was Blue and the case where the other witness claimed the cab was Green)?
In a variant of this scenario, Kahneman [2011, p. 167] replaced the first condition with: “The two companies operate the same number of cabs, but Green cabs are involved in 85% of the accidents.” How can this new scenario be represented as a belief network? Your belief network should allow observations about whether there is an accident as well as the color of the cab. Show examples of inferences in your network. Make reasonable choices for anything that is not fully specified. Be explicit about any assumptions you make.
Represent the same scenario as in Exercise 5.8 using a belief network. Show the network structure. Give all of the initial factors, making reasonable assumptions about the conditional probabilities (they should follow the story given in that exercise, but allow some noise). Give a qualitative explanation of why the patient has spots and fever.
In this question, you will build a belief-network representation of the Deep Space 1 (DS1) spacecraft considered in Exercise 5.10. Figure 5.14 depicts a part of the actual DS1 engine design.
Consider the following scenario:
Valves are $open$ or $closed$.
A value can be $ok$, in which case the gas will flow if the valve is open and not if it is closed; $broken$, in which case gas never flows; $stuck$, in which case gas flows independently of whether the valve is open or closed; or $leaking$, in which case gas flowing into the valve leaks out instead of flowing through.
There are three gas sensors that can detect whether some gas is leaking (but not which gas); the first gas sensor detects gas from the rightmost valves (${v}_{1},\mathrm{\dots},{v}_{4}$), the second sensor detects gas from the center valves (${v}_{5},\mathrm{\dots},{v}_{12}$), and the third sensor detects gas from the leftmost valves (${v}_{13},\mathrm{\dots},{v}_{16}$).
Build a belief-network representation of the valves that feed into engine ${e}_{1}$. Make sure there are appropriate probabilities.
Test your model on some non-trivial examples.
Consider the following belief network:
with Boolean variables ($A=true$ is written as $a$ and $A=false$ as
$\neg a$, and similarly for the other variable) and the following
conditional probabilities:
$$\begin{array}{cc}P(a)=0.9\hfill & P(d\mid b)=0.1\hfill \\ P(b)=0.2\hfill & P(d\mid \neg b)=0.8\hfill \\ P(c\mid a,b)=0.1\hfill & P(e\mid c)=0.7\hfill \\ P(c\mid a,\neg b)=0.8\hfill & P(e\mid \neg c)=0.2\hfill \\ P(c\mid \neg a,b)=0.7\hfill & P(f\mid c)=0.2\hfill \\ P(c\mid \neg a,\neg b)=0.4\hfill & P(f\mid \neg c)=0.9.\hfill \end{array}$$ |
Compute $P(e)$ using variable elimination (VE). You should first prune irrelevant variables. Show the factors that are created for a given elimination ordering.
Suppose you want to compute $P(e\mid \neg f)$ using VE. How much of the previous computation is reusable? Show the factors that are different from those in part (a).
Sam suggested that the recursive conditioning algorithm only needs to cache answers resulting from forgetting, rather than all answers. Is Sam’s suggestion better (in terms of space or search space reduced) than the given code for a single query? What about for multiple queries that share a cache? Give evidence (either theoretical or empirical) for your results.
Explain how to extend VE to allow for more general observations and queries. In particular, answer the following:
How can the VE algorithm be extended to allow observations that are disjunctions of values for a variable (e.g., of the form $X=a\vee X=b$)?
How can the VE algorithm be extended to allow observations that are disjunctions of values for different variables (e.g., of the form $X=a\vee Y=b$)?
How can the VE algorithm be extended to allow for the probability on a set of variables (e.g., asking for the $P(X,Y\mid e)$)?
In a nuclear research submarine, a sensor measures the temperature of the reactor core. An alarm is triggered ($A=true$) if the sensor reading is abnormally high ($S=true$), indicating an overheating of the core ($C=true$). The alarm and/or the sensor could be defective ($S\mathrm{\_}ok=false$, $A\mathrm{\_}ok=false$), which causes them to malfunction. The alarm system is modeled by the belief network of Figure 9.39.
What are the initial factors for this network? For each factor, state what it represents and what variables it is a function of.
Show how VE can be used to compute the probability that the core is overheating, given that the alarm does not go off; that is, $P(c\mid \neg a)$. For each variable eliminated, show which variable is eliminated, which factor(s) are removed, and which factor(s) are created, including what variables each factor is a function of. Explain how the answer is derived from the final factor.
Suppose we add a second, identical sensor to the system and trigger the alarm when either of the sensors reads a high temperature. The two sensors break and fail independently. Give the corresponding extended belief network.
This exercise continues Exercise 5.14.
Explain what knowledge (about physics and about students) a belief-network model requires.
What is the main advantage of using belief networks over using abductive diagnosis or consistency-based diagnosis in this domain?
What is the main advantage of using abductive diagnosis or consistency-based diagnosis over using belief networks in this domain?
Extend Example 9.30 so that it includes the state of the animal, which is either sleeping, foraging, or agitated.
If the animal is sleeping at any time, it does not make a noise, does not move, and at the next time point it is sleeping with probability 0.8 or foraging or agitated with probability 0.1 each.
If the animal is foraging or agitated, it tends to remain in the same state of composure (with probability 0.8), move to the other state of composure with probability 0.1, or go to sleep with probability 0.1.
If the animal is foraging in a corner, it will be detected by the microphone at that corner with probability 0.5, and if the animal is agitated in a corner, it will be detected by the microphone at that corner with probability 0.9. If the animal is foraging in the middle, it will be detected by each of the microphones with probability 0.2. If it is agitated in the middle, it will be detected by each of the microphones with probability 0.6. Otherwise, the microphones have a false positive rate of 0.05.
Represent this as a two-stage dynamic belief network. Draw the network, give the domains of the variables and the conditional probabilities.
What independence assumptions are embedded in the network?
Implement either variable elimination or particle filtering for this problem.
Does being able to hypothesize the internal state of the agent (whether it is sleeping, foraging, or agitated) help localization? Explain why.
Suppose Sam built a robot with five sensors and wanted to keep track of the location of the robot, and built a hidden Markov model (HMM) with the following structure (which repeats to the right):
What probabilities does Sam need to provide? You should label a copy of the diagram, if that helps explain your answer.
What independence assumptions are made in this model?
Sam discovered that the HMM with five sensors did not work as well as a version that only used two sensors. Explain why this may have occurred.
Consider the problem of filtering in HMMs.
Give a formula for the probability of some variable ${X}_{j}$ given future and past observations. You can base this on Equation 9.6. This should involve obtaining a factor from the previous state and a factor from the next state and combining them to determine the posterior probability of ${X}_{k}$. [Hint: Consider how VE, eliminating from the leftmost variable and eliminating from the rightmost variable, can be used to compute the posterior distribution for ${X}_{j}$.]
Computing the probability of all of the variables can be done in time linear in the number of variables by not recomputing values that were already computed for other variables. Give an algorithm for this.
Suppose you have computed the probability distribution for each state ${S}_{1},$ …, ${S}_{k}$, and then you get an observation for time $k+1$. How can the posterior probability of each variable be updated in time linear in $k$? [Hint: You may need to store more than just the distribution over each ${S}_{i}$.]
Which of the following algorithms suffers from underflow (real numbers that are too small to be represented using double precision floats): rejection sampling, importance sampling, particle filtering? Explain why. How could underflow be avoided?
What are the independence assumptions made in the naive Bayes classifier for the help system of Example 9.36.
Are these independence assumptions reasonable? Explain why or why not.
Suppose we have a topic-model network like the one of Figure 9.29, but where all of the topics are parents of all of the words. What are all of the independencies of this model?
Give an example where the topics would not be independent.
How well does particle filtering work for Example 9.48? Try to construct an example where Gibbs sampling works much better than particle filtering. [Hint: Consider unlikely observations after a sequence of variable assignments.]