Third edition of Artificial Intelligence: foundations of computational agents, Cambridge University Press, 2023 is now available (including the full text).

11.1 Clustering

Chapter 7 considered supervised learning, where the target features that must be predicted from input features are observed in the training data. In clustering or unsupervised learning, the target features are not given in the training examples. The aim is to construct a natural classification that can be used to cluster the data.

The general idea behind clustering is to partition the examples into clusters or classes. Each class predicts feature values for the examples in the class. Each clustering has a prediction error on the predictions. The best clustering is the one that minimizes the error.

Example 11.1: A diagnostic assistant may want to group the different treatments into groups that predict the desirable and undesirable effects of the treatment. The assistant may not want to give a patient a drug because similar drugs may have had disastrous effects on similar patients.

An intelligent tutoring system may want to cluster students' learning behavior so that strategies that work for one member of a class may work for other members.

In hard clustering, each example is placed definitively in a class. The class is then used to predict the feature values of the example. The alternative to hard clustering is soft clustering, in which each example has a probability distribution over its class. The prediction of the values for the features of an example is the weighted average of the predictions of the classes the example is in, weighted by the probability of the example being in the class.