# 10.6 References and Further Reading

Bayes classifiers are discussed by Duda et al. [2001] and Langley et al. [1992]. Friedman and Goldszmidt [1996a] discuss how the naive Bayes classifier can be generalized to allow for more appropriate independence assumptions. TAN networks are described by Friedman et al. [1997]. Latent tree models are described by Zhang [2004].

EM is due to Dempster et al. [1977]. Unsupervised learning is discussed by Cheeseman et al. [1988].

Bayesian learning is overviewed by Loredo [1990], Jaynes [2003], [MacKay, 2003], and Howson and Urbach [2006]. See also books on Bayesian statistics such as Gelman et al. [2004] and Bernardo and Smith [1994]. Bayesian learning of decision trees is described in Buntine [1992]. Grünwald [2007] discusses the MDL principle. Ghahramani [2015] reviews how Bayesian probability is used in AI.

For an overview of learning belief networks, see Heckerman [1999], Darwiche [2009], and Koller and Friedman [2009]. Structure learning using decision trees is based on Friedman and Goldszmidt [1996b]. The Bayesian information criteria is due to Schwarz [1978]. Note that our definition is slightly different; the definition of Schwarz is justified by a more complex Bayesian argument. Modeling missing data is discussed by Marlin et al. [2011] and Mohan and Pearl [2014].