3.2 State Spaces

One general formulation of intelligent action is in terms of state space. A state contains all of the information necessary to predict the effects of an action and to determine if it is a goal state. State-space searching assumes that

  • the agent has perfect knowledge of the state space and can observe what state it is in (i.e., there is full observability);
  • the agent has a set of actions that have known deterministic effects;
  • some states are goal states, the agent wants to reach one of these goal states, and the agent can recognize a goal state; and
  • a solution is a sequence of actions that will get the agent from its current state to a goal state.

Figure 3.1: The delivery robot domain with interesting locations labeled

Example 3.1: Consider the robot delivery domain and the task of finding a path from one location to another in Figure 3.1. This can be modeled as a state-space search problem, where the states are locations. Assume that the agent can use a lower-level controller to carry out the high-level action of getting from one location to a neighboring location. Thus, at this level of abstraction, the actions can involve deterministic traveling between neighboring locations.

An example problem is where the robot is outside room r103, at position o103, and the goal is to get to room r123. A solution is a sequence of actions that will get the robot to room r123.

Example 3.2: In a more complicated example, the delivery robot may have a number of parcels to deliver to various locations. In this case, the state may consist of the location of the robot, the parcels the robot is carrying, and the locations of the other parcels. The possible actions may be for the robot to move, to pick up parcels that are at the same location as the robot, or to put down whatever parcels it is carrying. A goal state may be one in which some specified parcels are at their desired locations. There may be many goal states because we may not care where the robot is or where some of the other parcels are.

Notice that this representation has ignored many details, for example, how the robot is carrying the parcels (which may affect whether it can carry other parcels), the battery level of the robot, whether the parcels are fragile or damaged, and the color of the floor. By not having these as part of the state space, we assume that these details are not relevant to the problem at hand.

Example 3.3: In a tutoring system, a state may consist of the set of topics that the student knows. The action may be teaching a particular lesson, and the result of a teaching action may be that the student knows the topic of the lesson as long as the student knows the topics that are prerequisites for the lesson being taught. The aim is for the student to know some particular set of topics.

If the effect of teaching also depends on the aptitude of the student, this detail must be part of the state space, too. We do not have to model what the student is carrying if that does not affect the result of actions or whether the goal is achieved.

A state-space problem consists of

  • a set of states;
  • a distinguished set of states called the start states;
  • a set of actions available to the agent in each state;
  • an action function that, given a state and an action, returns a new state;
  • a set of goal states, often specified as a Boolean function, goal(s), that is true when s is a goal state; and
  • a criterion that specifies the quality of an acceptable solution. For example, any sequence of actions that gets the agent to the goal state may be acceptable, or there may be costs associated with actions and the agent may be required to find a sequence that has minimal total cost. This is called an optimal solution. Alternatively, it may be satisfied with any solution that is within 10% of optimal.

This framework is extended in subsequent chapters to include cases where an agent can exploit the internal features of the states, where the state is not fully observable (e.g., the robot does not know where the parcels are, or the teacher does not know the aptitude of the student), where the actions are stochastic (e.g., the robot may overshoot, or the student perhaps does not learn a topic that is taught), and where complex preferences exist in terms of rewards and punishments, not just goal states.