foundations of computational agents
In forward planning, the search is constrained by the initial state and only uses the goal as a stopping criterion and as a source for heuristics. In regression planning, the search is constrained by the goal and only uses the start state as a stopping criterion and as a source for heuristics. By converting the problem to a constraint satisfaction problem (CSP), the initial state can be used to prune what is not reachable and the goal to prune what is not useful. The CSP will be defined for a finite number of steps; the number of steps can be adjusted to find the shortest plan. Any of the CSP methods from Chapter 4 can then be used to solve the CSP and thus find a plan.
To construct a CSP from a planning problem, first choose a fixed planning horizon, which is the number of time steps over which to plan. Suppose the horizon is $k$. The CSP has the following variables:
A state variable for each feature and each time from 0 to $k$. If there are $n$ features for a horizon of $k$, there are $n\ast (k+1)$ state variables. The domain of the state variable is the domain of the corresponding feature.
An action variable, $Actio{n}_{t}$, for each time $t$ in the range 0 to $k-1$. The domain of $Actio{n}_{t}$ is the set of all possible actions. The value of $Actio{n}_{t}$ represents the action that takes the agent from the state at time $t$ to the state at time $t+1$.
There are several types of constraints:
A precondition constraint between a state variable at time $t$ and the variable $Actio{n}_{t}$ constrains what actions are legal at time $t$.
An effect constraint between $Actio{n}_{t}$ and a state variable at time $t+1$ constrains the values of a state variable that is a direct effect of the action.
A frame constraint among a state variable at time $t$, the variable $Actio{n}_{t}$, and the corresponding state variable at time $t+1$ specifies when the variable that does not change as a result of an action has the same value before and after the action.
An initial-state constraint constrains a variable on the initial state (at time 0). The initial state is represented as a set of domain constraints on the state variables at time $0$.
A goal constraint constrains the final state to be a state that satisfies the achievement goal. These are domain constraints on the variables that appear in the goal.
A state constraint is a constraint among variables at the same time step. These can include physical constraints on the state or can ensure that states that violate maintenance goals are forbidden. This is extra knowledge beyond the power of the feature-based or STRIPS representations of the action.
The STRIPS representation gives precondition, effect, and frame constraints for each time $t$ as follows:
For each $Var=v$ in the precondition of action $A$, there is a precondition constraint
$$Va{r}_{t}=v\leftarrow Actio{n}_{t}=A$$ |
that specifies that if the action is to be $A$, $Va{r}_{t}$ must have value $v$ immediately before. This constraint is violated when $Actio{n}_{t}=A$ and $Va{r}_{t}\ne v$, and thus is equivalent to $\neg (Va{r}_{t}\ne v\wedge Actio{n}_{t}=A)$.
For each $Var=v$ in the effect of action $A$, there is an effect constraint
$$Va{r}_{t+1}=v\leftarrow Actio{n}_{t}=A$$ |
which is violated when $Va{r}_{t+1}\ne v\wedge Actio{n}_{t}=A$, and thus is equivalent to $\neg (Va{r}_{t+1}\ne v\wedge Actio{n}_{t}=A)$.
For each $Var$, there is a frame constraint, where $As$ is the set of actions that include $Var$ in the effect of the action:
$$Va{r}_{t+1}=Va{r}_{t}\leftarrow Actio{n}_{t}\notin As$$ |
which specifies that the feature $Var$ has the same value before and after any action that does not affect $Var$.
$RLo{c}_{i}$ – Rob’s location $RH{C}_{i}$ – Rob has coffee $SW{C}_{i}$ – Sam wants coffee $M{W}_{i}$ – Mail is waiting $RH{M}_{i}$ – Rob has mail $Actio{n}_{i}$ – Rob’s action |
Figure 6.4 shows a CSP representation of the delivery robot example, with a planning horizon of $k=2$. There are three copies of the state variables: one at time 0, the initial state; one at time 1; and one at time 2, the final state. There are action variables for times 0 and 1.
Precondition constraints: The constraints to the left of the action variable for each time are the precondition constraints. There is a separate constraint for each element of the precondition of the action.
The precondition for the action deliver coffee, $dc$, is $\{RLoc=off,rhc\}$; the robot has to be in the office and it must have coffee. Thus there are two precondition constraints for delivers coffee:
$RLo{c}_{t}=office\leftarrow Actio{n}_{t}=dc$ | ||
$RH{C}_{t}=true\leftarrow Actio{n}_{t}=dc.$ |
Effect constraints: The effect of delivering coffee ($dc$) is $\{\neg rhc,\neg swc\}$. Therefore there are two effect constraints:
$RH{C}_{t+1}=false\leftarrow Actio{n}_{t}=dc$ | ||
$SW{C}_{t+1}=false\leftarrow Actio{n}_{t}=dc.$ |
Frame constraints: Rob has mail ($rhm$) is not one of the effects of delivering coffee ($dc$). Thus there is a frame constraint:
$RH{M}_{t+1}=RH{M}_{t}\leftarrow Ac{t}_{t}=dc$ |
which is violated when $RH{M}_{t+1}\ne RH{M}_{t}\wedge Ac{t}_{t}=dc$.
Consider finding a plan to get Sam coffee, where initially, Sam wants coffee but the robot does not have coffee. This can be represented as initial-state constraints: $SW{C}_{0}=true$ and $RH{C}_{0}=false$.
With a planning horizon of 2, the goal is represented as the domain constraint $SW{C}_{2}=false$, and there is no solution.
With a planning horizon of 3, the goal is represented as the domain constraint $SW{C}_{3}=false$. This has many solutions, all with $RLo{c}_{0}=cs$ (the robot has to start in the coffee shop), $Actio{n}_{0}=puc$ (the robot has to pick up coffee initially), $Actio{n}_{1}=mc$ (the robot has to move to the office), and $Actio{n}_{2}=dc$ (the robot has to deliver coffee at time 2).
The CSP representation assumes a fixed planning horizon (i.e., a fixed number of steps). To find a plan over any number of steps, the algorithm can be run for a horizon of $k=0$, $1$, $2$, … until a solution is found. For the stochastic local search algorithm, it is possible to search multiple horizons at once, searching for all horizons, $k$ from 0 to $n$, and allowing $n$ to vary slowly. When solving the CSP using arc consistency and domain splitting, it is sometimes possible to determine that trying a longer plan will not help. That is, by analyzing why no solution exists for a horizon of $n$ steps, it may be possible to show that there can be no plan for any length greater than $n$. This will enable the planner to halt when there is no plan. See Exercise 6.10.
So far we have assumed that actions are atomic and that an agent can only do one action at any time. For the CSP representation, it can be useful to describe the actions in terms of features – to have a factored representation of actions as well as a factored representation of states. The features representing actions are called action features and the features representing states are called state features. The action features can be considered as actions in themselves that are carried out in the same time step.
In this case, there can be an extra set of constraints called action constraints to specify which action features cannot co-occur. These are sometimes called mutual exclusion or mutex constraints.
Another way to model the actions of Example 6.1 is that, at each step, Rob gets to choose
whether it will pick up coffee – let $PUC$ be a Boolean variable that is true when Rob picks up coffee
whether it will deliver coffee – let $DelC$ be a Boolean variable that is true when Rob delivers coffee
whether it will pick up mail – let $PUM$ be a Boolean variable that is true when Rob picks up mail
whether it will deliver mail – let $DelM$ be a Boolean variable that is true when Rob delivers mail
whether it moves. Let $Move$ be a variable with domain $\{mc,mcc,nm\}$ that specifies whether Rob moves clockwise, moves counterclockwise, or does not move ($nm$ means “not move”).
Thus the agent can be seen as doing more than one action in a single stage. For some of the actions at the same stage, the robot can do them in any order, such as delivering coffee and delivering mail. Some of the actions at the same stage need to be carried out in a particular order, for example, the agent must move after the other actions.
$RLo{c}_{i}$ – Rob’s location $RH{C}_{i}$ – Rob has coffee $SW{C}_{i}$ – Sam wants coffee $M{W}_{i}$ – Mail is waiting $RH{M}_{i}$ – Rob has mail $Mov{e}_{i}$ – Rob’s move action $PU{C}_{i}$ – Rob picks up coffee $DelC$ – Rob delivers coffee $PU{M}_{i}$ – Rob picks up mail $Del{M}_{i}$ – Rob delivers mail |
Consider finding a plan to get Sam coffee, where initially, Sam wants coffee but the robot does not have coffee. The initial state can be represented as two domain constraints: $SW{C}_{0}=true$ and $RH{C}_{0}=false$. The goal is that Sam no longer wants coffee, $SW{C}_{k}=false$.
With a planning horizon of 2, the CSP is shown in Figure 6.5. The goal is represented as the domain constraint $SW{C}_{2}=false$, there is a solution $RLo{c}_{0}=cs$ (the robot has to start in the coffee shop), $PU{C}_{0}=true$ (the robot has to pick up coffee initially), $Mov{e}_{0}=mc$ (the robot has to move to the office), and $D{C}_{1}=true$ (the robot has to deliver coffee at time 1).
Note that in the representation without factored actions, the problem cannot be solved with a horizon of 2; it requires a horizon of 3, as there are no concurrent actions.