foundations of computational agents
What has AI research achieved? Where do the current frontier issues lie? To get a systematic sense of the big picture, the agent design space for AI systems was described in terms of ten dimensions. It is instructive to see how representations presented in the book can be positioned in that space.
Ind. Choice Log.
|Number of Agents|
Figure 16.1 reviews the dimensions of complexity and classifies, in terms of the values for each dimension, some of the representations we have covered.
Hierarchical control allows for hierarchical reasoning. As presented, it did not involve planning or goals, but it can be combined with other techniques. For example, it is possible to do reinforcement learning at multiple levels of a hierarchy, or even to learn the hierarchy.
State-space search, as presented in Chapter 3, allows for an indefinite horizon but otherwise gives the simplest value in all the other dimensions. Regression planning, using either the STRIPS representation or the feature-based representation extends state-space search to reason in terms of features. Constraint satisfaction problem (CSP) planning allows for pruning the search space based on both the initial situation and the goal, but at the cost of planning for only a finite stage.
Decision networks represent features, stochastic effects, partial observability, and complex preferences in terms of utilities. However, as with CSP planning, these networks only reason for a finite stage planning horizon.
Markov decision processes (MDPs) allow for indefinite and infinite stage problems with stochastic actions and complex preferences; however, they are state-based representations that assume the state is fully observable. Dynamic decision networks (dynamic DN) extend MDPs to allow feature-based representation of states. They extend decision networks to allow for indefinite or infinite horizons, but they do not model sensing uncertainty. Partially observable MDPs (POMDPs) allow for a partially observable state but are much more difficult to solve.
The extensive form of a game extends state-space search to include multiple agents. It can handle partially observable domains through the use of information sets. Multiagent decision networks extend decision networks to allow for multiple agents.
Q-learning extends MDPs to allow for learning, but only deals with states. SARSA-LFA, SARSA with linear function approximation of the Q-function, does reinforcement learning with features.
The policy hill climbing (PHC) algorithm allows for learning with multiple agents, but it only allows a single state and a planning horizon of 1 (it plays a repeated single-step game); the only uncertainty is in the actions of the other agents. It can be seen as a reinforcement learning algorithm with a single state but with multiple agents.
The independent choice logic is a relational representation that allows for the representation of uncertainty in the effect of actions, sensing uncertainty, and utilities; however, in this most general framework, reasoning is more general and so possibly less efficient than POMDPs.
None of planning representations presented in the table handle hierarchical decomposition. However, large bodies of work exist on hierarchical planning and hierarchical reinforcement learning that are not presented in this book.
All of the representations can handle finite stage planning, for example by treating the features at different times as different variables. The planning systems based on (discounted or average) rewards can handle infinite stage planning, where the agents go on forever collecting rewards. It does not make sense for goal-oriented systems to go on forever.
All of the representations can handle states as the degenerate case. Reasoning in terms of features is the main design choice for many of the representations. Only the last two representations in Figure 16.1 allow for relational models, although many of the algorithms could potentially be made relational.
Bounded rationality underlies many of the approximation methods used for applications; however, making the explicit trade-off between thinking and acting, in which the agent reasons about whether it should act immediately or think more, is still relatively rare.
Figure 16.1 only shows three learning algorithms, although it is possible to learn the models for the others, for example, learning the conditional probabilities or the structure of probabilistic models (as model-based reinforcement learning learns probabilities for an MDP), or learning the structure of relational models.
The dimension that adds the most difficulty to the task of building an agent is sensing uncertainty. There are many ways to represent the function from the history of the agent to its actions. Ways to extend planning with sensing uncertainty and indefinite and infinite horizon problems are discussed in the context of POMDPs. How to handle sensing in all of its forms is one of the most active areas of current AI research.
The models that can use stochastic actions can also handle deterministic actions (as deterministic is a special case of stochastic). Some of them, such as MDPs and the reinforcement learning algorithms work very well in deterministic domains. Policy hill climbing (PHC) is not listed as working with deterministic actions, as it models the other agents as stochastic, even if they eventually converge to deterministic policies.
The models that can handle complex cardinal preferences can also handle goals by giving a reward to goal achievement (and perhaps a negative reward to not achieving a reward at each step, although a preference for not wasting time is also handled by discounting).
Dealing with multiple agents is much more difficult than planning for a single agent. Multiple agents can be cooperative or competitive, or more often somewhere in between, where they can compete in some aspects and cooperate in others. Sometimes, not being (or not appearing to be) rational is advantageous to an agent (see Example 11.14).
We have considered interaction as a single dimension, whereas real agents have to make quick online decisions as well as make more long-term decisions. Agents need to reason about multiple time-scales, and what can appear as offline in relation to a decision that has to be made in a second, could be seen as an online decision at the scale of days. No agent has the luxury of unlimited offline computation without risking never getting started.
As can be seen, this book has presented a small part of the design space of AI. The current frontier of research goes well beyond what is covered in this textbook. There is much active research in all areas of AI. There have been and continue to be impressive advances in planning, learning, perception, natural language understanding, robotics, and other subareas of AI. Most of this work considers multiple dimensions and how they interact. The is a growing interest for considering all of the dimensions and multiple tasks simultaneously (for example, under the rubric of artificial general intelligence), but doing everything well is difficult.
The decomposition of AI into subareas is not surprising. The design space is too big to explore all at once. Once a researcher has decided to handle, say, relational domains and reasoning about the existence of objects, it is difficult to add sensor uncertainty. If a researcher starts with learning with infinite horizons, it is difficult to add hierarchical reasoning, let alone learning with infinite horizons and relations together with hierarchies.
Some particular points in design space that have been at the frontier of research over the past few years include the following:
hierarchical reinforcement learning, where the agent learns at multiple levels of abstraction at once
multiagent reinforcement learning
relational probabilistic learning
natural understanding that takes into account ambiguity, context, and pragmatics to give appropriate answers
robot cars that can drive through uncertain environments
intelligent tutoring systems that take into account noisy sensors of students’ emotions
supervised deep learning for perception, where deep learning is used to learn low-level and higher-level features (and ones in-between) which work together to make predictions.
As AI practitioners, we still do not know how to build an agent that acts rationally in infinite-stage, partially observable domains consisting of individuals and relations in which there are multiple agents acting autonomously. Arguably humans do this, perhaps by reasoning hierarchically and approximately. Although we may not yet be able to build an intelligent artificial agent with human-level performance, we may have the building blocks to develop one. The main challenge is handling the complexity of the real world. However, so far there seem to be no intrinsic obstacles to building computational embodied agents capable of human-level performance.