foundations of computational agents
The following components are part of any learning problem:
The behavior or task that is being improved
The experiences that are used to improve performance in the task, usually in the form of a sequence of examples
How the improvement is measured – for example, new skills that were not present initially, increasing accuracy in prediction, or improved speed
Consider the agent internals of Figure 2.9. The problem of learning is to take in prior knowledge and data (e.g., about the experiences of the agent) and to create an internal representation (the knowledge base) that is used by the agent as it acts.
Learning techniques face the following issues:
Virtually any task for which an agent can get data or experiences can be learned. The most commonly studied learning task is supervised learning: given some input features, some target features, and a set of training examples where the input features and the target features are specified, predict the value of target features for new examples given their values on the input features. This is called classification when the target features are discrete and regression when the target features are continuous.
Other learning tasks include learning classifications when the examples do not have targets defined (unsupervised learning), learning what to do based on rewards and punishments (reinforcement learning), learning to reason faster (analytic learning), and learning richer representations such as logic programs (inductive logic programming).
Learning tasks can be characterized by the feedback given to the learner. In supervised learning, what has to be learned is specified for each training example. Supervised classification occurs when a trainer provides the classification for each example. Supervised learning of actions occurs when the agent is given immediate feedback about the value of an action in the current situation. Unsupervised learning occurs when no classifications are given and the learner must discover categories and regularities in the data. Feedback often falls between these extremes, such as in reinforcement learning, where the feedback in terms of rewards and punishments occurs after a sequence of actions. This leads to the credit assignment problem of determining which actions were responsible for the rewards or punishments. For example, a user could give rewards to the delivery robot without telling it exactly what it is being rewarded for. The robot then must either learn what it is being rewarded for or learn which actions are preferred in which situations. It is possible that it can learn which actions to perform without actually determining which consequences of the actions are responsible for rewards.
For an agent to use its experiences, the experiences must affect the agent’s internal representation. This internal representation could be the raw experiences themselves, but it is typically a compact representation that generalizes the data.
The problem of inferring an internal representation based on examples is called induction in contrast to deduction, which is deriving consequences of a knowledge base, and abduction, which is hypothesizing what may be true about a particular case.
There are two principles that are at odds in choosing a representation:
The richer the representation, the more useful it is for subsequent problem solving. For an agent to learn a way to solve a task, the representation must be rich enough to express a way to solve the task.
The richer the representation, the more difficult it is to learn. A very rich representation is difficult to learn because it requires a great deal of data, and often many different hypotheses are consistent with the data.
The representations required for intelligence are a compromise between many desiderata. The ability to learn the representation is one of them, but it is not the only one.
Much of machine learning is studied in the context of particular representations (e.g., decision trees, neural networks, or case bases). This chapter presents some standard representations to show the common features behind learning.
In offline learning, all of the training examples are available to an agent before it needs to act. In online learning, training examples arrive as the agent is acting. An agent that learns online requires some representation of its previously seen examples before it has seen all of its examples. As new examples are observed, the agent must update its representation. Typically, an agent never sees all of the examples it could possibly see. Active learning is a form of online learning in which the agent acts to acquire useful examples from which to learn. In active learning, the agent reasons about which examples would be useful to learn from and acts to collect those examples.
Learning is defined in terms of improving performance based on some measure. To know whether an agent has learned, we must define a measure of success. The measure is usually not how well the agent performs on the training data, but how well the agent performs for new data.
In classification, being able to correctly classify all training examples is not the goal. For example, consider predicting a Boolean (true/false) feature based on a set of examples. Suppose that there were two agents and . Agent claims that all of the negative examples seen are the only negative examples and that every other instance is positive. Agent claims that the positive examples in the training set are the only positive examples and that every other instance is negative. Both agents correctly classify every example in the training set but disagree on every other example. Success in learning should not be judged on correctly classifying the training set but on being able to correctly classify unseen examples. Thus, the learner must generalize: go beyond the specific given examples to classify unseen examples.
A standard way to evaluate a learning procedure is to divide the examples into training examples and test examples. A representation is built using the training examples, and the predictive accuracy is measured on the test examples. To properly evaluate the method, the test cases should not be known while the training is occurring. Of course, using a test set is only an approximation of what is wanted; the real measure is its performance on some future task.
The tendency to prefer one hypothesis over another is called a bias. Consider the agents and defined earlier. Saying that a hypothesis is better than ’s or ’s hypothesis is not something that is obtained from the data – both and accurately predict all of the data given – but is something external to the data. Without a bias, an agent will not be able to make any predictions on unseen examples. The hypotheses adopted by and disagree on all further examples, and, if a learning agent cannot choose some hypotheses as better, the agent will not be able to resolve this disagreement. To have any inductive process make predictions on unseen data, an agent requires a bias. What constitutes a good bias is an empirical question about which biases work best in practice; we do not imagine that either ’s or ’s biases work well in practice.
Given a representation and a bias, the problem of learning can be reduced to one of search. Learning is a search through the space of possible representations, trying to find the representation or representations that best fit the data given the bias. Unfortunately, the search spaces are typically prohibitively large for systematic search, except for the simplest of examples. Nearly all of the search techniques used in machine learning can be seen as forms of local search through a space of representations. The definition of the learning algorithm then becomes one of defining the search space, the evaluation function, and the search method.
In most real-world situations, the data are not perfect. There can be noise where the observed features are not adequate to predict the classification, missing data where the observations of some of the features for some or all of the examples are missing, and errors where some of the features have been assigned wrong values. One of the important properties of a learning algorithm is its ability to handle noisy data in all of its forms.
For domains with a natural interpretation of “between,” such as where the features are about time or space, interpolation involves making a prediction between cases for which there are data. Extrapolation involves making a prediction that goes beyond the seen examples. Extrapolation is usually much less accurate than interpolation. For example, in ancient astronomy, the Ptolemaic system and the heliocentric system of Copernicus made detailed models of the movement of solar system in terms of epicycles (cycles within cycles). The parameters for the models could be made to fit the data very well and they were very good at interpolation; however, the models were very poor at extrapolation. As another example, it is often easy to predict a stock price on a certain day given data about the prices on the days before and the days after that day. It is very difficult to predict the price that a stock will be tomorrow, and it would be very profitable to be able to do so. An agent must be careful if its test cases mostly involve interpolating between data points, but the learned model is used for extrapolation.