foundations of computational agents
The third edition of Artificial Intelligence: foundations of computational agents, Cambridge University Press, 2023 is now available (including full text).
Bickel et al. [1975] report on gender biases for graduate admissions at UC Berkeley. This example is based on that case, but the numbers are fictional.
There are two departments, which we will call $dept\mathrm{\#}1$ and $dept\mathrm{\#}2$ (so $Dept$ is a random variable with values $dept\mathrm{\#}1$ and $dept\mathrm{\#}2$) which students can apply to. Assume students apply to one, but not both. Students have a gender (male or female), and are either admitted or not. Consider the table of the percent of students in each category of Figure 8.33.
Dept | Gender | Admitted | Percent |
---|---|---|---|
$dept\mathrm{\#}1$ | $male$ | $true$ | 32 |
$dept\mathrm{\#}1$ | $male$ | $false$ | 18 |
$dept\mathrm{\#}1$ | $female$ | $true$ | 7 |
$dept\mathrm{\#}1$ | $female$ | $false$ | 3 |
$dept\mathrm{\#}2$ | $male$ | $true$ | 5 |
$dept\mathrm{\#}2$ | $male$ | $false$ | 14 |
$dept\mathrm{\#}2$ | $female$ | $true$ | 7 |
$dept\mathrm{\#}2$ | $female$ | $false$ | 14 |
In the semantics of possible worlds, we will treat the students as possible worlds, each with the same measure.
What is $P(Admitted=true\mid Gender=male)$?
What is $P(Admitted=true\mid Gender=female)$?
Which gender is more likely to be admitted?
What is $P(Admitted=true\mid Gender=male,Dept=dept\mathrm{\#}1)$?
What is $P(Admitted=true\mid Gender=female,Dept=dept\mathrm{\#}1)$?
Which gender is more likely to be admitted to
$dept\mathrm{\#}1$?
What is $P(Admitted=true\mid Gender=male,Dept=dept\mathrm{\#}2)$?
What is $P(Admitted=true\mid Gender=female,Dept=dept\mathrm{\#}2)$?
Which gender is more likely to be admitted to
$dept\mathrm{\#}2$?
This is an instance of Simpson’s paradox. Why is it a paradox? Explain why it happened in this case.
Give another scenario where Simpson’s paradox occurs.
Prove Proposition 8.1 for finitely many worlds, namely that the axioms of probability (Section 8.1.2) are sound and complete with respect to the semantics of probability. [Hint: For soundness, show that each of the axioms is true based on the semantics. For completeness, construct a probability measure from the axioms.]
Using only the axioms of probability and the definition of conditional independence, prove Proposition 8.5.
Consider the belief network of Figure 8.34. This the “Simple diagnostic example” in the AIspace belief network tool at http://www.aispace.org/bayes/. For each of the following, first predict the answer based on your intuition, then run the belief network to check it. Explain the result you found by carrying out the inference.
The posterior probabilities of which variables change when $Smokes$ is observed to be true? That is, give the variables $X$ such that $P(X\mid Smoke=true)\ne P(X)$.
Starting from the original network, the posterior probabilities of which variables change when $Fever$ is observed to be true? That is, specify the $X$ where $P(X\mid Fever=true)\ne P(X)$.
Does the probability of $Fever$ change when
$Wheezing$ is observed to be true?
That is, is
$P(Fever\mid Wheezing=true)\ne P(Fever)$? Explain why (in terms of the
domain, in language that could
be understood by someone who did not know about belief networks).
Suppose $Wheezing$ is observed to be true. Does the observing $Fever$ change the probability of $Smokes$? That is, is $P(Smokes\mid Wheezing)\ne P(Smokes\mid Wheezing,Fever)$? Explain why (in terms that could be understood by someone who did not know about belief networks).
What could be observed so that subsequently observing $Wheezing$ does not change the probability of $SoreThroat$. That is, specify a variable or variables $X$ such that $P(SoreThroat\mid X)=P(SoreThroat\mid X,Wheezing)$, or state that there are none. Explain why.
Suppose $Allergies$ could be another explanation of $SoreThroat$. Change the network so that $Allergies$ also affects $SoreThroat$ but is independent of the other variables in the network. Give reasonable probabilities.
What could be observed so that observing $Wheezing$ changes the probability of $Allergies$? Explain why.
What could be observed so that observing $Smokes$ changes the probability of $Allergies$? Explain why.
Note that parts (a), (b), and (c) only involve observing a single variable.
Consider the belief network of Figure 8.35, which extends the electrical domain to include an overhead projector.
Answer the following questions about how knowledge of the values of some variables would affect the probability of another variable.
Can knowledge of the value of $Projector\mathrm{\_}plugged\mathrm{\_}in$ affect your belief in the value of $Sam\mathrm{\_}reading\mathrm{\_}book$? Explain.
Can knowledge of $Screen\mathrm{\_}lit\mathrm{\_}up$ affect your belief in $Sam\mathrm{\_}reading\mathrm{\_}book$? Explain.
Can knowledge of $Projector\mathrm{\_}plugged\mathrm{\_}in$ affect your belief in $Sam\mathrm{\_}reading\mathrm{\_}book$ given that you have observed a value for $Screen\mathrm{\_}lit\mathrm{\_}up$? Explain.
Which variables could have their probabilities changed if just $Lamp\mathrm{\_}works$ was observed?
Which variables could have their probabilities changed if just $Power\mathrm{\_}in\mathrm{\_}projector$ was observed?
Kahneman [2011, p. 166] gives the following example.
A cab was involved in a hit-and-run accident at night. Two cab companies, Green and Blue, operate in the city. You are given the following data:
85% of the cabs in the city are Green and 15% are Blue.
A witness identified the cab as Blue. The court tested the reliability of the witness in the circumstances that existed on the night of the accident and concluded that the witness correctly identifies each one of the two colors 80% of the time and fails 20% of the time.
What is the probability that the cab involved in the accident was Blue?
Represent this story as a belief network. Explain all variables and conditional probabilities. What is observed, what is the answer?
Suppose there were three independent witnesses, two of which claimed the cab was Blue and one of whom claimed the cab was Green. Show the corresponding belief network. What is the probability the cab was Blue? What if all three claimed the cab was Blue?
Suppose it was found that the two witnesses who claimed the cab was Blue were not independent, but there was a 60% chance they colluded. (What might this mean?) Show the corresponding belief network, and the relevant probabilities. What is the probability that the cab is Blue, (both for the case where all three witnesses claim that cab was Blue and the case where the other witness claimed the cab was Green)?
In a variant of this scenario, Kahneman [2011, p. 167] replaced the first condition with: “The two companies operate the same number of cabs, but Green cabs are involved in 85% of the accidents.” How can this new scenario be represented as a belief network? Your belief network should allow observations about whether there is an accident as well as the color of the taxi. Show examples of inferences in your network. Make reasonable choices for anything that is not fully specified. Be explicit about any assumptions you make.
Represent the same scenario as in Exercise 8 using a belief network. Show the network structure. Give all of the initial factors, making reasonable assumptions about the conditional probabilities (they should follow the story given in that exercise, but allow some noise). Give a qualitative explanation of why the patient has spots and fever.
Suppose you want to diagnose the errors school students make when adding multidigit binary numbers. Consider adding two two-digit numbers to form a three-digit number.
That is, the problem is of the form:
$$\begin{array}{ccc}& \hfill {A}_{1}\hfill & \hfill {A}_{0}\hfill \\ \hfill +\hfill & \hfill {B}_{1}\hfill & \hfill {B}_{0}\hfill \\ \hfill {C}_{2}\hfill & \hfill {C}_{1}\hfill & \hfill {C}_{0}\hfill \end{array}$$ |
where ${A}_{i}$, ${B}_{i}$, and ${C}_{i}$ are all binary digits.
Suppose you want to model whether students know binary addition and whether they know how to carry. If students know how, they usually get the correct answer, but sometimes make mistakes. Students who do not know how to do the appropriate task simply guess.
What variables are necessary to model binary addition and the errors students could make? You must specify, in words, what each of the variables represents. Give a DAG that specifies the dependence of these variables.
What are reasonable conditional probabilities for this domain?
Implement this, perhaps by using the AIspace.org belief-network tool. Test your representation on a number of different cases.
You must give the graph, explain what each variable means, give the probability tables, and show how it works on a number of examples.
In this question, you will build a belief network representation of the Deep Space 1 (DS1) spacecraft considered in Exercise 10. Figure 5.14 depicts a part of the actual DS1 engine design.
Consider the following scenario.
Valves are $open$ or $closed$.
A value can be $ok$, in which case the gas will flow if the valve is open and not if it is closed; $broken$, in which case gas never flows; $stuck$, in which case gas flows independently of whether the valve is open or closed; or $leaking$, in which case gas flowing into the valve leaks out instead of flowing through.
There are three gas sensors that can detect whether gas is leaking (but not which gas); the first gas sensor detects gas from the rightmost valves (${v}_{1}\mathrm{\dots}{v}_{4}$), the second gas sensor detects gas from the center valves (${v}_{5}\mathrm{\dots}{v}_{12}$), and the third gas sensor detects gas from the leftmost valves (${v}_{13}\mathrm{\dots}{v}_{16}$).
Build a belief-network representation of the domain. You only must consider the topmost valves (those that feed into engine ${e}_{1}$). Make sure there are appropriate probabilities.
Test your model on some non-trivial examples.
Consider the following belief network:
with Boolean variables (we write $A=true$ as $a$ and $A=false$ as
$\mathrm{\neg}a$) and the following conditional probabilities:
$P(a)=0.9$
$P(b)=0.2$
$P(c\mid a,b)=0.1$
$P(c\mid a,\mathrm{\neg}b)=0.8$
$P(c\mid \mathrm{\neg}a,b)=0.7$
$P(c\mid \mathrm{\neg}a,\mathrm{\neg}b)=0.4$
$P(d\mid b)=0.1$
$P(d\mid \mathrm{\neg}b)=0.8$
$P(e\mid c)=0.7$
$P(e\mid \mathrm{\neg}c)=0.2$
$P(f\mid c)=0.2$
$P(f\mid \mathrm{\neg}c)=0.9$
Compute $P(e)$ using variable elimination (VE). You should first prune irrelevant variables. Show the factors that are created for a given elimination ordering.
Suppose you want to compute $P(e\mid \mathrm{\neg}f)$ using VE. How much of the previous computation is reusable? Show the factors that are different from those in part (a).
Explain how to extend VE to allow for more general observations and queries. In particular, answer the following.
How can the VE algorithm be extended to allow observations that are disjunctions of values for a variable (e.g., of the form $X=a\vee X=b$)?
How can the VE algorithm be extended to allow observations that are disjunctions of values for different variables (e.g., of the form $X=a\vee Y=b$)?
How can the VE algorithm be extended to allow for the probability on a set of variables (e.g., asking for the $P(X,Y\mid e)$)?
In a nuclear research submarine, a sensor measures the temperature of the reactor core. An alarm is triggered ($A=true$) if the sensor reading is abnormally high ($S=true$), indicating an overheating of the core ($C=true$). The alarm and/or the sensor could be defective ($S\mathrm{\_}ok=false$, $A\mathrm{\_}ok=false$), which causes them to malfunction. The alarm system is modeled by the belief network of Figure 8.36.
What are the initial factors for this network? For each factor, state what it represents and what variables it is a function of.
Show how VE can be used to compute the probability that the core is overheating, given that the alarm does not go off; that is, $P(c\mid \mathrm{\neg}a)$. For each variable eliminated, show which variable is eliminated, which factor(s) are removed, and which factor(s) are created, including what variables each factor is a function of. Explain how the answer is derived from the final factor.
Suppose we add a second, identical sensor to the system and trigger the alarm when either of the sensors reads a high temperature. The two sensors break and fail independently. Give the corresponding extended belief network.
In this exercise, we continue Exercise 14.
Explain what knowledge (about physics and about students) a belief-network model requires.
What is the main advantage of using belief networks over using abductive diagnosis or consistency-based diagnosis in this domain?
What is the main advantage of using abductive diagnosis or consistency-based diagnosis over using belief networks in this domain?
Suppose Kim has a camper van (a mobile home) and likes to keep it at a comfortable temperature and noticed that the energy use depended on the elevation. Kim knows that the elevation affects the outside temperature. Kim likes the camper warmer at higher elevation. Note that not all of the variables directly affect electrical usage.
Show how this can be represented as a causal network, using the variables “Elevation”, “Electrical Usage”, “Outside Temperature”, “Thermostat Setting”.
Give an example where intervening has an effect different from conditioning for this network.
The aim of this exercise is to extend Example 8.29. Suppose the animal is either sleeping, foraging or agitated.
If the animal is sleeping at any time, it does not make a noise, does not move and at the next time point it is sleeping with probability 0.8 or foraging or agitated with probability 0.1 each.
If the animal is foraging or agitated, it tends to remain in the same state of composure (with probability 0.8) move to the other state of composure with probability 0.1 or go to sleep with probability 0.1.
If the animal is foraging in a corner, it will be detected by the microphone at that corner with probability 0.5, and if the animal is agitated in a corner it will be detected by the microphone at that corner with probability 0.9. If the animal is foraging in the middle, it will be detected by each of the microphones with probability 0.2. If it is agitated in the middle it will be detected by each of the microphones with probability 0.6. Otherwise the microphones have a false positive rate of 0.05.
Represent this as a two-stage dynamic belief network. Draw the network, give the domains of the variables and the conditional probabilities.
What independence assumptions are embedded in the network?
Implement either variable elimination or particle filtering for this problem.
Does being able to hypothesize the internal state of the agent (whether it is sleeping, foraging, or agitated) help localization? Explain why.
Suppose Sam built a robot with five sensors and wanted to keep track of the location of the robot, and built a hidden Markov model (HMM) with the following structure (which repeats to the right):
What probabilities does Sam need to provide? You should label a copy of the diagram, if that helps explain your answer.
What independence assumptions are made in this model?
Sam discovered that the HMM with five sensors did not work as well as a version that only used two sensors. Explain why this may have occurred.
Consider the problem of filtering in HMMs.
Give a formula for the probability of some variable ${X}_{j}$ given future and past observations. You can base this on Equation 8.2. This should involve obtaining a factor from the previous state and a factor from the next state and combining them to determine the posterior probability of ${X}_{k}$. [Hint: Consider how VE, eliminating from the leftmost variable and eliminating from the rightmost variable, can be used to compute the posterior distribution for ${X}_{j}$.]
Computing the probability of all of the variables can be done in time linear in the number of variables by not recomputing values that were already computed for other variables. Give an algorithm for this.
Suppose you have computed the probability distribution for each state ${S}_{1},$ …, ${S}_{k}$, and then you get an observation for time $k+1$. How can the posterior probability of each variable be updated in time linear in $k$? [Hint: You may need to store more than just the distribution over each ${S}_{i}$.]
Which of the following algorithms suffers from underflow (real numbers that are too small to be represented using double precision floats): rejection sampling, importance sampling, particle filtering? Explain why. How could underflow be avoided?
What are the independence assumptions made in the naive Bayes classifier for the help system of Example 8.35.
Are these independence assumptions reasonable? Explain why or why not.
Suppose we have a topic-model network like the one of Figure 8.26, but where all of the topics are parents of all of the words. What are all of the independencies of this model?
Give an example where the topics would not be independent.
Suppose you get a job where the boss is interested in localization of a robot that is carrying a camera around a factory. The boss has heard of variable elimination, rejection sampling, and particle filtering and wants to know which would be most suitable for this task. You must write a report for your boss (using proper English sentences), explaining which one of these technologies would be most suitable. For the two technologies that are not the most suitable, explain why you rejected them. For the one that is most suitable, explain what information is required by that technology to use it for localization:
VE (i.e., exact inference as used in HMMs),
rejection sampling, or
particle filtering.
How well does particle filtering work for Example 8.46? Try to construct an example where Gibbs sampling works much better than particle filtering. [Hint: Consider unlikely observations after a sequence of variable assignments.]