7.2 Supervised Learning

One learning task is supervised learning, where there is a set of examples, and a set of features, partitioned into input features and target features. The aim is to predict the values of the target features from the input features.

A feature is a function from examples into a value. If $e$ is an example, and $F$ is a feature, ${F}({e})$ is the value of feature $F$ for example $e$ . The domain of a feature is the set of values it can return. Note that this is the range of the function, but is traditionally called the domain.

In a supervised learning task, the learner is given

•

a set of input features, $X_{1},\dots,X_{n}$
•

a set of target features, $Y_{1},\dots,Y_{k}$
•

a set of training examples, where the values for the input features and the target features are given for each example, and
•

a set of test examples, where only the values for the input features are given.

The aim is to predict the values of the target features for the test examples and as-yet-unseen examples.

$E x a m p l e$	$A u t h o r$	$T h r e a d$	$L e n g t h$	$Where\_read$	$User\_action$
$e_{1}$	$k n o w n$	$n e w$	$l o n g$	$h o m e$	$s k i p s$
$e_{2}$	$u n k n o w n$	$n e w$	$s h o r t$	$w o r k$	$r e a d s$
$e_{3}$	$u n k n o w n$	$f o l l o w u p$	$l o n g$	$w o r k$	$s k i p s$
$e_{4}$	$k n o w n$	$f o l l o w u p$	$l o n g$	$h o m e$	$s k i p s$
$e_{5}$	$k n o w n$	$n e w$	$s h o r t$	$h o m e$	$r e a d s$
$e_{6}$	$k n o w n$	$f o l l o w u p$	$l o n g$	$w o r k$	$s k i p s$
$e_{7}$	$u n k n o w n$	$f o l l o w u p$	$s h o r t$	$w o r k$	$s k i p s$
$e_{8}$	$u n k n o w n$	$n e w$	$s h o r t$	$w o r k$	$r e a d s$
$e_{9}$	$k n o w n$	$f o l l o w u p$	$l o n g$	$h o m e$	$s k i p s$
$e_{10}$	$k n o w n$	$n e w$	$l o n g$	$w o r k$	$s k i p s$
$e_{11}$	$u n k n o w n$	$f o l l o w u p$	$s h o r t$	$h o m e$	$s k i p s$
$e_{12}$	$k n o w n$	$n e w$	$l o n g$	$w o r k$	$s k i p s$
$e_{13}$	$k n o w n$	$f o l l o w u p$	$s h o r t$	$h o m e$	$r e a d s$
$e_{14}$	$k n o w n$	$n e w$	$s h o r t$	$w o r k$	$r e a d s$
$e_{15}$	$k n o w n$	$n e w$	$s h o r t$	$h o m e$	$r e a d s$
$e_{16}$	$k n o w n$	$f o l l o w u p$	$s h o r t$	$w o r k$	$r e a d s$
$e_{17}$	$k n o w n$	$n e w$	$s h o r t$	$h o m e$	$r e a d s$
$e_{18}$	$u n k n o w n$	$n e w$	$s h o r t$	$w o r k$	$r e a d s$
$e_{19}$	$u n k n o w n$	$n e w$	$l o n g$	$w o r k$	$?$
$e_{20}$	$u n k n o w n$	$f o l l o w u p$	$s h o r t$	$h o m e$	$?$

Figure 7.1: Examples of a user’s preferences. These are some training and test examples obtained from observing a user deciding whether to read articles posted to a threaded discussion website depending on whether the author is known or not, whether the article started a new thread or was a follow-up, the length of the article, and whether it is read at home or at work.

e_{1},\dots,e_{18}

are the training examples. The aim is to make a prediction for the user action on

e_{19}

e_{20}

, and other, currently unseen, examples.

Example 7.1.

Figure 7.1 shows training and test examples typical of a classification task. The aim is to predict whether a person reads an article posted to a threaded discussion website given properties of the article. The input features are $A u t h o r$ , $T h r e a d$ , $L e n g t h$ , and $Where\_read$ . There is one target feature, $User\_action$ . The domain of $A u t h o r$ is $\{known,unknown\}$ , the domain of $T h r e a d$ is $\{new,followup\}$ , and so on.

There are eighteen training examples, each of which has a value for all of the features. In this data set, ${Author}({e_{11}}){=}unknown$ , ${Thread}({e_{11}}){=}followup$ , and ${UserAction}({e_{11}}){=}skips$ .

There are two test examples, $e_{19}$ and $e_{20}$ , where the user action is unknown.

Example 7.2.

Example	$X$	$Y$
$e_{1}$	0.7	1.7
$e_{2}$	1.1	2.4
$e_{3}$	1.3	2.5
$e_{4}$	1.9	1.7
$e_{5}$	2.6	2.1
$e_{6}$	3.1	2.3
$e_{7}$	3.9	7
$e_{8}$	2.9	?
$e_{9}$	5.0	?

Figure 7.2: Training and test examples for a regression task

Figure 7.2 shows some data for a regression task, where the aim is to predict the value of feature $Y$ on examples for which the value of feature $X$ is provided. This is a regression task because $Y$ is a real-valued feature. Predicting a value of $Y$ for example $e_{8}$ is an interpolation problem, as its value for the input feature is between the values of the training examples. Predicting a value of $Y$ for the example $e_{9}$ is an extrapolation problem, because its $X$ value is outside the range of the training examples.

7.2.1 Evaluating Predictions

7.2.2 Types of Errors

7.2.3 Point Estimates with No Input Features

Artificial Intelligence 2E

7.2 Supervised Learning

Example 7.1.

Example 7.2.

Artificial
Intelligence 2E