# 7.2 Supervised Learning

One learning task is supervised learning, where there is a set of examples, and a set of features, partitioned into input features and target features. The aim is to predict the values of the target features from the input features.

A feature is a function from examples into a value. If $e$ is an example, and $F$ is a feature, ${F}({e})$ is the value of feature $F$ for example $e$. The domain of a feature is the set of values it can return. Note that this is the range of the function, but is traditionally called the domain.

In a supervised learning task, the learner is given

• a set of input features, $X_{1},\dots,X_{n}$

• a set of target features, $Y_{1},\dots,Y_{k}$

• a set of training examples, where the values for the input features and the target features are given for each example, and

• a set of test examples, where only the values for the input features are given.

The aim is to predict the values of the target features for the test examples and as-yet-unseen examples.

###### Example 7.1.

Figure 7.1 shows training and test examples typical of a classification task. The aim is to predict whether a person reads an article posted to a threaded discussion website given properties of the article. The input features are $Author$, $Thread$, $Length$, and $Where\_read$. There is one target feature, $User\_action$. The domain of $Author$ is $\{known,unknown\}$, the domain of $Thread$ is $\{new,followup\}$, and so on.

There are eighteen training examples, each of which has a value for all of the features. In this data set, ${Author}({e_{11}}){=}unknown$, ${Thread}({e_{11}}){=}followup$, and ${UserAction}({e_{11}}){=}skips$.

There are two test examples, $e_{19}$ and $e_{20}$, where the user action is unknown.

###### Example 7.2.

Figure 7.2 shows some data for a regression task, where the aim is to predict the value of feature $Y$ on examples for which the value of feature $X$ is provided. This is a regression task because $Y$ is a real-valued feature. Predicting a value of $Y$ for example $e_{8}$ is an interpolation problem, as its value for the input feature is between the values of the training examples. Predicting a value of $Y$ for the example $e_{9}$ is an extrapolation problem, because its $X$ value is outside the range of the training examples.