Third edition of Artificial Intelligence: foundations of computational agents, Cambridge University Press, 2023 is now available (including the full text).

## 7.7 Learning as Refining the Hypothesis Space

So far, learning is either choosing the best representation - for example, the best decision tree or the best values for parameters in a neural network - or predicting the value of the target features of a new case from a database of previous cases. This section considers a different notion of learning, namely learning as delineating those hypotheses that are consistent with the examples. Rather than choosing a hypothesis, the aim is to find all hypotheses that are consistent. This investigation will shed light on the role of a bias and provide a mechanism for a theoretical analysis of the learning problem.

We make three assumptions:

- There is a single target feature,
*Y*, that is Boolean. This is not really a restriction for classification, because any discrete feature can be made into Boolean features using indicator variables. - The hypotheses make definitive predictions, predicting true or false for each example, rather than probabilistic prediction.
- There is no noise in the data.

Given these assumptions, it is possible to write a hypothesis in terms of a proposition, where the primitive propositions are assignments to the input features.

**Example 7.22:**The decision tree of Figure 7.4 can be seen as a representation

*reads*defined by the proposition

*pval(e,Reads)=val(e,Short)∧(val(e,New)∨ val(e,Known)) .*

For the rest of this section, we write this more simply as

*reads ↔short ∧(new ∨ known) .*

The goal is to try to find a proposition on the input features that correctly classifies the training examples.

**Example 7.23:**Consider the trading agent trying to infer which books or articles the user reads based on keywords supplied in the article. Suppose the learning agent has the following data:

article | Crime | Academic | Local | Music | Reads |

a _{1} | true | false | false | true | true |

a _{2} | true | false | false | false | true |

a _{3} | false | true | false | false | false |

a _{4} | false | false | true | false | false |

a _{5} | true | true | false | false | true |

The aim is to learn which articles the user reads.

In this example, *reads* is the target feature, and the aim is to
find a definition such as

reads ↔crime ∧(¬academic ∨ ¬music) .

This definition may be used to classify the training examples as well as future examples.

Hypothesis space learning assumes the following sets:

*I*, the**instance space**, is the set of all possible examples., the**H****hypothesis space**, is a set of Boolean functions on the input features.*E⊆I*is the set of**training examples**. Values for the input features and the target feature are given for the training example.

If *h∈ H* and

*i∈I*, we write

*h(i)*to mean the value that

*h*predicts for

*i*on the target variable

*Y*.

**Example 7.24:**In Example 7.23,

*I*is the set of the

*2*possible examples, one for each combination of values for the features.

^{5}=32The hypothesis space * H* could be all Boolean combinations of the input
features or could be more restricted, such as conjunctions or
propositions defined in terms of fewer than three features.

In Example 7.23, the training examples are
*E={a _{1},a_{2},a_{3},a_{4},a_{5}}*. The
target feature is

*Reads*. Because the table specifies some of the values of this feature, and the learner will make predictions on unseen cases, the learner requires a bias. In hypothesis space learning, the bias is imposed by the hypothesis space.

Hypothesis *h* is **consistent** with a set of training examples *E*
if *∀e ∈E*, *h* accurately predicts the target feature of
*e*. That is, *h(e)=val(e,Y)*; the predicted value is the same as the actual value for
each example.
The problem is to find the subset of * H* or just an element of

*consistent with all of the training examples.*

**H****Example 7.25:**Consider the data of Example 7.23, and suppose

*is the set of conjunctions of literals. An example hypothesis in*

**H***that is consistent with*

**H***{a*is

_{1}}*¬academic ∧music*. This hypothesis means that the person reads an article if and only if

*¬academic ∧music*is true of the article. This concept is not the target concept because it is inconsistent with

*{a*.

_{1},a_{2}}