## 7.6 Case-Based Reasoning

The previous methods tried to find a compact representation of the data
that can be used for future prediction.
In **case-based reasoning**, the training examples - the cases - are stored and
accessed to solve a new problem. To get a prediction for a new
example, those cases that are
similar, or close to, the new example are used to predict the value of
the target features of the new example. This is at one extreme of the
learning problem where, unlike decision trees and neural networks,
relatively little work must be done offline, and
virtually all of the work is performed at query time.

Case-based reasoning can be used for classification and regression. It is also applicable when the cases are complicated, such as in legal cases, where the cases are complex legal rulings, and in planning, where the cases are previous solutions to complex problems.

If the cases are simple, one algorithm that
works well is to use the ** k-nearest neighbors** for some given number

*k*. Given a new example, the

*k*training examples that have the input features closest to that example are used to predict the target value for the new example. The prediction can be the mode, average, or some interpolation between the prediction of these

*k*training examples, perhaps weighting closer examples more than distant examples.

For this method to work, a distance metric is required that measures the closeness of
two examples. First define a metric for the domain of each
feature, in which the values of the features are converted to a
numerical scale that can be used to compare values. Suppose *val(e,X _{i})* is a numerical
representation of the value of feature

*X*for the example

_{i}*e*. Then

*(val(e*is the difference between example

_{1},X_{i}) - val(e_{2},X_{i}))*e*and

_{1}*e*on the dimension defined by feature

_{2}*X*. The

_{i}**Euclidean distance**, the square root of the sum of the squares of the dimension differences, can be used as the distance between two examples. One important issue is the relative scales of different dimensions; increasing the scale of one dimension increases the importance of that feature. Let

*w*be a non-negative real-valued parameter that specifies the weight of feature

_{i}*X*. The distance between examples

_{i}*e*and

_{1}*e*is then

_{2}d(e_{1},e_{2}) =sqrt(∑_{i}w_{i}×(val(e_{1},X_{i}) - val(e_{2},X_{i}))^{2}) .

The feature weights can be provided as input. It is also possible to
learn these weights. The learning agent can try to find a parameter setting that minimizes the
error in predicting the value of each element of the
training set, based on every other instance in the training set. This
is called the **leave-one-out
cross-validation** error measure.

**Example 7.20:**Consider using case-based reasoning on the data of Figure 7.1. Rather than converting the data to a secondary representation as in decision-tree or neural-network learning, case-based reasoning uses the examples directly to predict the value for the user action in a new case.

Suppose a learning agent wants to classify example *e _{20}*, for which the author is unknown,
the thread is a follow-up, the length is short, and it is read at home.
First the learner tries to find similar cases. There is an exact match in
example

*e*, so it may want to predict that the user does the same action as for example

_{11}*e*and thus skips the article. It could also include other close examples.

_{11}Consider classifying example *e _{19}*, where the author is unknown, the thread is new,
the length is long, and it was read at work. In this case there are
no exact matches. Consider the close matches. Examples

*e*,

_{2}*e*, and

_{8}*e*agree on the features

_{18}*Author*,

*Thread*, and

*Where Read*. Examples

*e*and

_{10}*e*agree on

_{12}*Thread*,

*Length*, and

*Where Read*. Example

*e*agrees on

_{3}*Author*,

*Length*, and

*Where Read*. Examples

*e*,

_{2}*e*, and

_{8}*e*predict

_{18}*Reads*, but the other examples predict

*Skips*. So what should be predicted? The decision-tree algorithm says that

*Length*is the best predictor, and so

*e*,

_{2}*e*, and

_{8}*e*should be ignored. For the sigmoid linear learning algorithm, the parameter values in Example 7.10 similarly predict that the reader skips the article. A case-based reasoning algorithm to predict whether the user will or will not read this article must determine the relative importance of the dimensions.

_{18}One of the problems in case-based reasoning is accessing the relevant
cases. A
** kd-tree** is a way to index the training examples so that
training examples that are close to a given example can be found
quickly. Like a decision tree, a

*kd*-tree splits on input features, but at the leaves are subsets of the training examples. In building a

*kd*-tree from a set of examples, the learner tries to find an input feature that partitions the examples into set of approximately equal size and then builds

*kd*-trees for the examples in each partition. This division stops when all of the examples at a leaf are the same. A new example can be filtered down the tree, as in a decision tree. The exact matches will be at the leaf found. However, the examples at the leaves of the

*kd*-tree could possibly be quite distant from the example to be classified; they agree on the values down the branch of the tree but could disagree on the values of all other features. The same tree can be used to search for those examples that have one feature different from the ones tested in the tree. See Exercise 7.16.

Case-based reasoning is also applicable when the cases are more complicated, for example, when they are legal cases or previous solutions to planning problems. In this scenario, the cases can be carefully chosen and edited to be useful. Case-based reasoning can be seen as a cycle of the following four tasks.

- Retrieve:
- Given a new case, retrieve similar cases from the case base.
- Reuse:
- Adapt the retrieved cases to fit to the new case.
- Revise:
- Evaluate the solution and revise it based on how well it works.
- Retain:
- Decide whether to retain this new case in the case base.

The revision can involve other reasoning techniques, such as using the proposed solution as a starting point to search for a solution, or a human could do the adaptation in an interactive system. Retaining can then save the new case together with the solution found.

**Example 7.21:**A common example of a case-based reasoning system is a helpdesk that users call with problems to be solved. For example, case-based reasoning could be used by the diagnostic assistant to help users diagnose problems on their computer systems. When a user gives a description of their problem, the closest cases in the case base are retrieved. The diagnostic assistant can recommend some of these to the user, adapting each case to the user's particular situation. An example of adaptation is to change the recommendation based on what software the user has, what method they use to connect to the Internet, and the brand of printer. If one of the cases suggested works, that can be recorded in the case base to make that case be more important when another user asks a similar question. If none of the cases found works, some other problem solving can be done to solve the problem, perhaps by adapting other cases or having a human help diagnose the problem. When the problem is finally fixed, what worked in that case can be added to the case base.