10.7 References and Further Reading

Bayesian learning is overviewed by Jaynes [2003], MacKay [2003], Howson and Urbach [2006], and Ghahramani [2015]. See also books on Bayesian statistics such as Gelman et al. [2020], [McElreath, 2020], or, for more rigor, Gelman et al. [2013]. Murphy [2022, 2023] provides a comprehensive coverage of the topics of this chapter.

Bayes classifiers are discussed by Duda et al. [2001] and Langley et al. [1992]. TAN networks are described by Friedman et al. [1997], who also discuss how the naive Bayes classifier can be generalized to allow for more appropriate independence assumptions. Latent tree models are described by Zhang [2004]. Bayesian learning of decision trees is described in Buntine [1992]. Grünwald [2007] discusses the MDL principle.

The k-means algorithm was invented by 1957 by Lloyd [1982]. Schubert [2022] discusses how to choose the number of clusters. EM is due to Dempster et al. [1977]. Unsupervised learning is discussed by Cheeseman et al. [1988].

For an overview of learning belief networks, see Heckerman [1999], Darwiche [2009], and Koller and Friedman [2009]. The Bayesian information criteria is due to Schwarz [1978]. Our definition is slightly different; the definition of Schwarz is justified by a more complex Bayesian argument.