7.10 References and Further Reading

For good overviews of machine learning see Mitchell (1997), Duda et al. (2001), Bishop (2008), and Hastie et al. (2009).

The collection of papers by Shavlik and Dietterich (1990) contains many classic learning papers. Michie et al. (1994) give empirical evaluation of many learning algorithms on many different problems. Briscoe and Caelli (1996) discuss many different machine learning algorithms. Weiss and Kulikowski (1991) overview techniques for classification learning. Davis and Goadrich (2006) discusses precision, recall, and ROC curves.

The approach to combining expert knowledge and data was proposed by Spiegelhalter et al. (1990).

Decision-tree learning is discussed by Quinlan (1986). For an overview of a mature decision-tree learning tool see Quinlan (1993). The Gini index [Exercise 7.10] is the splitting criteria used in CART [Breiman et al. (1984)].

TAN networks are described by Friedman et al. (1997). Latent tree models are described by Zhang (2004).

For overviews of neural networks see Bishop (1995), Hertz et al. (1991), and Jordan and Bishop (1996). Back-propagation is introduced in Rumelhart et al. (1986). Minsky and Papert (1988) analyze the limitations of neural networks.

For reviews of ensemble learning see Dietterich (2002). Boosting is described in Schapire (2002) and Meir and Rätsch (2003).

For reviews on case-based reasoning see Aamodt and Plaza (1994), Kolodner and Leake (1996), and Lopez De Mantaras et al. (2005). For a review of nearest-neighbor algorithms, see Duda et al. (2001) and Dasarathy (1991). The dimension-weighting learning nearest-neighbor algorithm is from Lowe (1995). For a classical review of case-based reasoning, see Riesbeck and Schank (1989), and for recent reviews see Aha et al. (2005).

Version spaces were defined by Mitchell (1977). PAC learning was introduced by Valiant (1984). The analysis here is due to Haussler (1988). Kearns and Vazirani (1994) give a good introduction to computational learning theory and PAC learning. For more details on version spaces and PAC learning, see Mitchell (1997).

For overviews of Bayesian learning, see Jaynes (2003), Loredo (1990), Howson and Urbach (2006), and Cheeseman (1990). See also books on Bayesian statistics such as Gelman et al. (2004) and Bernardo and Smith (1994). Bayesian learning of decision trees is described in Buntine (1992). Grünwald (2007) discusses the MDL principle.

For research results on machine learning, see the journals Journal of Machine Learning Research (JMLR), Machine Learning, the annual International Conference on Machine Learning (ICML), the Proceedings of the Neural Information Processing Society (NIPS), or general AI journals such as Artificial Intelligence and the Journal of Artificial Intelligence Research, and many specialized conferences and journals.