6 Planning with Certainty

6.4 Planning as a CSP

Dimensions: flat, features, finite horizon, fully observable, deterministic, goal directed, non-learning, single agent, offline, perfect rationality

In forward planning, the search is constrained by the initial state and only uses the goal as a stopping criterion and as a source for heuristics. In regression planning, the search is constrained by the goal and only uses the start state as a stopping criterion and as a source for heuristics. By converting the problem to a constraint satisfaction problem (CSP), the initial state can be used to prune what is not reachable and the goal to prune what is not useful. The CSP will be defined for a finite number of steps; the number of steps can be adjusted to find the shortest plan. One of the CSP methods from Chapter 4 can then be used to solve the CSP and thus find a plan.

To construct a CSP from a planning problem, first choose a fixed planning horizon, which is the number of time steps over which to plan. Suppose the horizon is k. The CSP has the following variables:

  • a state variable for each feature and each time from 0 to k. If there are n features for a horizon of k, there are n*(k+1) state variables. The domain of the state variable is the domain of the corresponding feature.

  • an action variable, Actiont, for each time t in the range 0 to k-1. The domain of Actiont is the set of all possible actions. The value of Actiont represents the action that takes the agent from the state at time t to the state at time t+1.

There are several types of constraints:

  • A precondition constraint between a state variable at time t and the variable Actiont constrains what actions are legal at time t.

  • An effect constraint between Actiont and a state variable at time t+1 constrains the values of a state variable that is a direct effect of the action.

  • A frame constraint among a state variable at time t, the variable Actiont, and the corresponding state variable at time t+1 specifies when the variable that does not change as a result of an action has the same value before and after the action.

  • An initial-state constraint constrains a variable on the initial state (at time 0). The initial state is represented as a set of domain constraints on the state variables at time 0.

  • A goal constraint constrains the final state to be a state that satisfies the achievement goal. These are domain constraints on the variables that appear in the goal.

  • A state constraint is a constraint among variables at the same time step. These can include physical constraints on the state or can ensure that states that violate maintenance goals are forbidden. This is extra knowledge beyond the power of the feature-based or STRIPS representations of the action.

The STRIPS representation gives precondition, effect and frame constraints for each time t as follows:

  • For each Var=v in the precondition of action A, there is a precondition constraint

    Vart=vActiont=A

    that specifies that if the action is to be A, Vart must have value v immediately before. This constraint is violated when Actiont=A and Vartv, and thus is is equivalent to ¬(VartvActiont=A).

  • For each Var=v in the effect of action A, there is an effect constraint

    Vart+1=vActiont=A

    which is violated when Vart+1vActiont=A, and thus is equivalent to ¬(Vart+1vActiont=A).

  • For each Var, there is a frame constraint, where As is the set of actions that include Var in the effect of the action:

    Vart+1=VartActiontAs

    which specifies that the feature Var has the same value before and after any action that does not affect Var.

RLoci – Rob’s location RHCi – Rob has coffee SWCi – Sam wants coffee MWi – Mail is waiting RHMi – Rob has mail Actioni – Rob’s action
Figure 6.4: The delivery robot CSP planner for a planning horizon of k=2
Example 6.13.

Figure 6.4 shows a CSP representation of the delivery robot example, with a planning horizon of k=2. There are three copies of the state variables: one at time 0, the initial state; one at time 1; and one at time 2, the final state. There are action variables for times 0 and 1.

Precondition constraints: The constraints to the left of the action variable for each time are the precondition constraints. There is a separate constraint for each element of the precondition of the action.

The precondition for the action deliver coffee, dc, is {RLoc=off,rhc}; the robot has to be in the office and it must have coffee. Thus there are two precondition constraints for delivers coffee:

RLoct=officeActiont=dc
RHCt=trueActiont=dc

Effect constraints: The effect of delivering coffee (dc) is {¬rhc,¬swc}. Therefore there are two effect constraints

RHCt+1=falseActiont=dc
SWCt+1=officeActiont=dc

Frame constraints: Rob has mail (rhm) is not one of the effects of delivering coffee (dc). Thus there is a frame constraint

RHMt+1=RHMtActt=dc

which is violated when RHMt+1RHMtActt=dc.

Example 6.14.

Consider finding a plan to get Sam coffee, where initially, Sam wants coffee but the robot does not have coffee. This can be represented as initial-state constraints: SWC0=true and RHC0=false.

With a planning horizon of 2, the goal is represented as the domain constraint SWC2=false, and there is no solution.

With a planning horizon of 3, the goal is represented as the domain constraint SWC3=false. This has many solutions, all with RLoc0=cs (the robot has to start in the coffee shop), Action0=puc (the robot has to pick up coffee initially), Action1=mc (the robot has to move to the office), and Action2=dc (the robot has to deliver coffee at time 2).

The CSP representation assumes a fixed planning horizon (i.e., a fixed number of steps). To find a plan over any number of steps, the algorithm can be run for a horizon of k=0, 1, 2,…until a solution is found. For the stochastic local search algorithm, it is possible to search multiple horizons at once, searching for all horizons, k from 0 to n, and allowing n to vary slowly. When solving the CSP using arc consistency and domain splitting, it is sometimes possible to determine that trying a longer plan will not help. That is, by analyzing why no solution exists for a horizon of n steps, it may be possible to show that there can be no plan for any length greater than n. This will enable the planner to halt when there is no plan. See Exercise 11.