9.10 References and Further Reading

Introductions to probability from an AI perspective, and belief (Bayesian) networks, are by Pearl [1988], Koller and Friedman [2009], Darwiche [2009], and [Murphy, 2023]. Halpern [2003] overviews foundations of probability. van de Meent et al. [2018] overview probabilistic programming.

Recursive conditioning is due to Darwiche [2001]. Variable elimination for belief networks, also called bucket elimination, is presented in Zhang and Poole [1994] and Dechter [1996]. Darwiche [2009] and Dechter [2019] compare these and other methods. Bodlaender [1993] discusses treewidth. Choi et al. [2020] overview probabilistic circuits.

For comprehensive reviews of information theory, see Cover and Thomas [2006], MacKay [2003], and Grünwald [2007].

Brémaud [1999] describes theory and applications of Markov chains. HMMs are described by Rabiner [1989]. Dynamic Bayesian networks were introduced by Dean and Kanazawa [1989]. Markov localization and other issues on the relationship of probability and robotics are described by Thrun et al. [2005]. The use of particle filtering for localization is due to Dellaert et al. [1999].

Shannon and Weaver [1949] pioneered probabilistic models of natural language and forecast many future developments. Manning and Schütze [1999] and Jurafsky and Martin [2023] present probabilistic and statistical methods for natural language. The topic model of Example 9.38 is based on Google’s Rephil, described in the supplementary material of Murphy [2023].

For introductions to stochastic simulation, see Rubinstein [1981] and Andrieu et al. [2003]. Likelihood weighting in belief networks is based on Henrion [1988]. Importance sampling in belief networks is based on Cheng and Druzdzel [2000], who also consider how to learn the proposal distribution. There is a collection of articles on particle filtering in Doucet et al. [2001].

The annual Conference on Uncertainty in Artificial Intelligence, and the general AI conferences, provide up-to-date research results.