7 Supervised Machine Learning

The third edition of Artificial Intelligence: foundations of computational agents, Cambridge University Press, 2023 is now available (including full text).

7.2 Supervised Learning

One learning task is supervised learning, where there is a set of examples, and a set of features, partitioned into input features and target features. The aim is to predict the values of the target features from the input features.

A feature is a function from examples into a value. If e is an example, and F is a feature, F(e) is the value of feature F for example e. The domain of a feature is the set of values it can return. Note that this is the range of the function, but is traditionally called the domain.

In a supervised learning task, the learner is given

  • a set of input features, X1,,Xn

  • a set of target features, Y1,,Yk

  • a set of training examples, where the values for the input features and the target features are given for each example, and

  • a set of test examples, where only the values for the input features are given.

The aim is to predict the values of the target features for the test examples and as-yet-unseen examples.

Example Author Thread Length Where_read User_action
e1 known new long home skips
e2 unknown new short work reads
e3 unknown followup long work skips
e4 known followup long home skips
e5 known new short home reads
e6 known followup long work skips
e7 unknown followup short work skips
e8 unknown new short work reads
e9 known followup long home skips
e10 known new long work skips
e11 unknown followup short home skips
e12 known new long work skips
e13 known followup short home reads
e14 known new short work reads
e15 known new short home reads
e16 known followup short work reads
e17 known new short home reads
e18 unknown new short work reads
e19 unknown new long work ?
e20 unknown followup short home ?
Figure 7.1: Examples of a user’s preferences. These are some training and test examples obtained from observing a user deciding whether to read articles posted to a threaded discussion website depending on whether the author is known or not, whether the article started a new thread or was a follow-up, the length of the article, and whether it is read at home or at work. e1,,e18 are the training examples. The aim is to make a prediction for the user action on e19, e20, and other, currently unseen, examples.
Example 7.1.

Figure 7.1 shows training and test examples typical of a classification task. The aim is to predict whether a person reads an article posted to a threaded discussion website given properties of the article. The input features are Author, Thread, Length, and Where_read. There is one target feature, User_action. The domain of Author is {known,unknown}, the domain of Thread is {new,followup}, and so on.

There are eighteen training examples, each of which has a value for all of the features. In this data set, Author(e11)=unknown, Thread(e11)=followup, and UserAction(e11)=skips.

There are two test examples, e19 and e20, where the user action is unknown.

Example 7.2.
Example X Y
e1 0.7 1.7
e2 1.1 2.4
e3 1.3 2.5
e4 1.9 1.7
e5 2.6 2.1
e6 3.1 2.3
e7 3.9 7
e8 2.9 ?
e9 5.0 ?
Figure 7.2: Training and test examples for a regression task

Figure 7.2 shows some data for a regression task, where the aim is to predict the value of feature Y on examples for which the value of feature X is provided. This is a regression task because Y is a real-valued feature. Predicting a value of Y for example e8 is an interpolation problem, as its value for the input feature is between the values of the training examples. Predicting a value of Y for the example e9 is an extrapolation problem, because its X value is outside the range of the training examples.