foundations of computational agents
For good overviews of machine learning see Briscoe and Caelli , Mitchell , Duda et al. , Bishop , Hastie et al.  and Murphy . Halevy et al.  discuss big data. Domingos  overviews issues in machine learning. The UCI machine learning repository [Lichman, 2013] is a collection of classic machine learning data sets.
The collection of papers by Shavlik and Dietterich  contains many classic learning papers. Michie et al.  give empirical evaluation of many learning algorithms on multiple problems. Davis and Goadrich  discusses precision, recall, and ROC curves. Settles  overviews active learning.
The approach to combining expert knowledge and data was proposed by Spiegelhalter et al. .
Ng  compares and regularization for logistic regression.
Goodfellow et al.  provide a modern overview of neural networks and deep learning. For classic overviews of neural networks see Hertz et al.  and Bishop . McCulloch and Pitts  defines a formal neuron, and Minsky  showed how such representations can be learned from data. Rosenblatt  introduced the perceptron. Back-propagation is introduced in Rumelhart et al. . LeCun et al.  describe how to effectively implement back-propagation. Minsky and Papert  analyze the limitations of neural networks. LeCun et al.  review how multilayer neural networks have been used for deep learning in many applications. Hinton et al.  review neural networks for speech recognition, Goldberg  for natural language processing, and Krizhevsky et al.  for vision. Rectified linear units are discussed by Glorot et al. . Nocedal and Wright  provides practical advice on gradient descent and related methods. Karimi et al.  analyze how many iterations of stochastic gradient descent are needed.
Random forests were introduced by Breiman , and are compared by Dietterich [2000a] and Denil et al. . For reviews of ensemble learning see Dietterich . Boosting is described in Schapire  and Meir and Rätsch .
For overviews of case-based reasoning see Kolodner and Leake  and López . For a review of nearest-neighbor algorithms, see Duda et al.  and Dasarathy . The dimension-weight learning nearest-neighbor algorithm is from Lowe .
Version spaces were defined by Mitchell . PAC learning was introduced by Valiant . The analysis here is due to Haussler . Kearns and Vazirani  give a good introduction to computational learning theory and PAC learning. For more details on version spaces and PAC learning, see Mitchell .
For research results on machine learning, see the journals Journal of Machine Learning Research (JMLR), Machine Learning, the annual International Conference on Machine Learning (ICML), the Proceedings of the Neural Information Processing Society (NIPS), or general AI journals such as Artificial Intelligence and the Journal of Artificial Intelligence Research, and many specialized conferences and journals.