19.2 Agent Design Space Revisited

The agent design space provides a way to understand the frontier of knowledge about AI. It is instructive to see how representations presented in the book can be positioned in that space.


Hier. Control




Det. Planning


Decision Net




Dynamic DN




Extensive game


Q Learning


Deep RL


Stochastic PI

Planning Horizon
Computational Limits
Sensing Uncertainty
fully obs.
partial obs.
Effect Uncertainty
Number of Agents
Figure 19.2: Some agent models rated by dimensions of complexity

Figure 19.2 reviews the dimensions of complexity and classifies, in terms of the values for each dimension, some of the agent models covered in the book. These agent models were selected because they have different values in the dimensions.

Agent Models

The following describes the agent models that form the columns of Figure 19.2.

  • Hier. Control, hierarchical control, means reasoning at multiple levels of abstraction. As presented it was non-planning, and it did not take goals or utility into account.

  • State-space search, as presented in Chapter 3, allows for an indefinite horizon but otherwise gives the simplest value in all the other dimensions. Det. planning, deterministic planning (Chapter 6), either regression planning, forward planning, or CSP planning, extends state-space search to reason in terms of features.

  • Decision Net., decision networks, extend belief networks to include decision and utility nodes, and can represent features, stochastic effects, partial observability, and complex preferences in terms of utilities. However, these networks only model a finite-stage planning horizon. Markov decision processes (MDPs) allow for indefinite and infinite-stage problems with stochastic actions and complex preferences; however, they are state-based representations that assume the state is fully observable. Dynamic decision networks (dynamic DN) extend MDPs to allow feature-based representation of states. Partially observable MDPs (POMDPs) allow for partially observable states but are much more difficult to solve.

  • Game tree search for the extensive form of a game extends state-space search to include multiple agents and utility. It can handle partially observable domains through the use of information sets.

  • Q-learning extends MDPs to allow for online learning, but only deals with states. Deep reinforcement learning (Deep RL), and other methods that use function approximation in reinforcement learning, do reinforcement learning with features. These work for single agents or adversaries, but not for arbitrary multiagent domains.

  • Stochastic PI, stochastic policy iteration, allows for learning with multiple agents but needs to play the same game repeatedly with the same agents to coordinate.

Dimensions Revisited

None of the planning representations presented in Figure 19.2 handle hierarchical control. In hierarchical control, low-level controls act faster than high-level deliberation. While deep reinforcement learning uses hierarchies in its representation of the state, it only plans at a single level of abstraction. Hierarchical planning and hierarchical reinforcement learning are not presented in this book, although much research exists. Hierarchical reasoning does not need to work as a monolith; different techniques can be used at low levels than used at high levels. There is evidence that humans have quite different systems for high-level deliberative reasoning than for low-level perception and reactions.

Some of the representations (such as decision networks) model each decision separately and are only applicable for a finite sequence of decisions. Some allow for indefinitely many decisions, but then the policies are typically stationary (not dependent on time, unless time is part of the state space). The planning systems based on (discounted or average) rewards can handle infinite-stage planning, where the agents go on forever collecting rewards. It does not make sense for goal-oriented systems to go on forever.

All of the representations can handle states as the degenerate case of having a single feature. Reasoning in terms of features is the main design choice for many of the representations, either engineered features or learned features. Reasoning in terms of features can be much more efficient than reasoning in terms of states, as the number of states is exponentially more than the number of features. None of the representations in Figure 19.2 allow for relational models, although many of the algorithms can be made relational.

Bounded rationality underlies many of the approximation methods used for applications; however, making the explicit trade-off between thinking and acting, in which the agent reasons about whether it should act immediately or think more, is still relatively rare.

Figure 19.2 only shows three learning algorithms, although it is possible to learn the models for the others, for example, learning the conditional probabilities or the structure of probabilistic models, as, for example, model-based reinforcement learning learns the transition and reward probabilities for an MDP.

The dimension that adds the most difficulty to the task of building an agent is sensing uncertainty. With partial observability, there are many possible states the world could be in. The outcome of an action by the agent can depend on the actual state of the world. All the agent has access to is its history of past percepts and actions. There are many ways to represent the function from the history of the agent to its actions. Ways to extend planning with sensing uncertainty and indefinite and infinite-horizon problems are discussed in the context of POMDPs. How to handle sensing in all of its forms is one of the most active areas of current AI research.

The models that can use stochastic actions can also handle deterministic actions (as deterministic is a special case of stochastic). Some of them, such as MDPs and the reinforcement learning algorithms, work well in deterministic domains.

Preferences are either specified in terms of goals or utilities; Proposition 12.3 proved that complex preferences, under very mild assumptions, can be represented in terms of utility. The models that can handle complex cardinal preferences can also handle goals by giving a reward to goal achievement. A preference to the shortest path to a goal can be achieved by negative rewards for actions that do not lead to a goal or discounting. In general, utilities are more expressive than goals.

Dealing with multiple agents is much more difficult than planning for a single agent. The case of an agent with a single adversary agent is simpler than the more general cases. Multiple agents can be cooperative or competitive, or more often somewhere in between, where they can compete in some aspects and cooperate in others. Communication between agents is a standard way to achieve cooperation in society. Another way to achieve cooperation between self-interested agents is in the design of mechanisms for making society work, including money and legislation. This book has hardly scratched the surface of what can be done.

Interactivity is a single dimension, whereas real agents have to make quick online decisions as well as make more long-term decisions. Agents need to reason about multiple time-scales, and what can appear as offline in relation to a decision that has to be made in a second could be seen as an online decision at the scale of days. Unlimited offline computation risks the possibility of the agent never actually acting.

This book has presented the details of a small part of the design space of AI. The current frontier of research goes beyond what is covered in this textbook. There is much active research in all areas of AI. There have been and continue to be impressive advances in planning, learning, perception, natural language understanding, robotics, and other subareas of AI. Most of this work considers multiple dimensions and how they interact. There is growing interest in considering all of the dimensions and multiple tasks simultaneously (for example, under the rubric of artificial general intelligence), but doing everything well is difficult.

The decomposition of AI into subareas is not surprising. The design space is too big to explore all at once. Once a researcher has decided to handle, say, relational domains and reasoning about the existence of objects, it is difficult to add sensor uncertainty. If a researcher starts with learning with infinite horizons, it is difficult to add hierarchical reasoning, let alone learning with infinite horizons and relations together with hierarchies.

As AI practitioners, we still do not know how to build an agent that acts rationally in infinite-stage, partially observable domains consisting of individuals and relations in which there are multiple agents acting autonomously. Arguably humans do this, perhaps by reasoning hierarchically and approximately. Although we may not yet be able to build an intelligent artificial agent with human-level performance, we may have the building blocks to develop one. The main challenge is handling the complexity of the real world. However, so far there seem to be no intrinsic obstacles to building computational embodied agents capable of human-level performance or beyond.