foundations of computational agents
Dimensions: flat, features, finite horizon, fully observable, deterministic, goal directed, non-learning, single agent, offline, perfect rationality
In forward planning, the search is constrained by the initial state and only uses the goal as a stopping criterion and as a source for heuristics. In regression planning, the search is constrained by the goal and only uses the start state as a stopping criterion and as a source for heuristics. By converting the problem to a constraint satisfaction problem (CSP), the initial state can be used to prune what is not reachable and the goal to prune what is not useful. The CSP will be defined for a finite number of steps; the number of steps can be adjusted to find the shortest plan. One of the CSP methods from Chapter 4 can then be used to solve the CSP and thus find a plan.
To construct a CSP from a planning problem, first choose a fixed planning horizon, which is the number of time steps over which to plan. Suppose the horizon is $k$. The CSP has the following variables:
a state variable for each feature and each time from 0 to $k$. If there are $n$ features for a horizon of $k$, there are $n*(k+1)$ state variables. The domain of the state variable is the domain of the corresponding feature.
an action variable, $Actio{n}_{t}$, for each time $t$ in the range 0 to $k-1$. The domain of $Actio{n}_{t}$ is the set of all possible actions. The value of $Actio{n}_{t}$ represents the action that takes the agent from the state at time $t$ to the state at time $t+1$.
There are several types of constraints:
A precondition constraint between a state variable at time $t$ and the variable $Actio{n}_{t}$ constrains what actions are legal at time $t$.
An effect constraint between $Actio{n}_{t}$ and a state variable at time $t+1$ constrains the values of a state variable that is a direct effect of the action.
A frame constraint among a state variable at time $t$, the variable $Actio{n}_{t}$, and the corresponding state variable at time $t+1$ specifies when the variable that does not change as a result of an action has the same value before and after the action.
An initial-state constraint constrains a variable on the initial state (at time 0). The initial state is represented as a set of domain constraints on the state variables at time $0$.
A goal constraint constrains the final state to be a state that satisfies the achievement goal. These are domain constraints on the variables that appear in the goal.
A state constraint is a constraint among variables at the same time step. These can include physical constraints on the state or can ensure that states that violate maintenance goals are forbidden. This is extra knowledge beyond the power of the feature-based or STRIPS representations of the action.
The STRIPS representation gives precondition, effect and frame constraints for each time $t$ as follows:
For each $Var=v$ in the precondition of action $A$, there is a precondition constraint
$$Va{r}_{t}=v\leftarrow Actio{n}_{t}=A$$ |
that specifies that if the action is to be $A$, $Va{r}_{t}$ must have value $v$ immediately before. This constraint is violated when $Actio{n}_{t}=A$ and $Va{r}_{t}\ne v$, and thus is is equivalent to $\mathrm{\neg}(Va{r}_{t}\ne v\wedge Actio{n}_{t}=A)$.
For each $Var=v$ in the effect of action $A$, there is an effect constraint
$$Va{r}_{t+1}=v\leftarrow Actio{n}_{t}=A$$ |
which is violated when $Va{r}_{t+1}\ne v\wedge Actio{n}_{t}=A$, and thus is equivalent to $\mathrm{\neg}(Va{r}_{t+1}\ne v\wedge Actio{n}_{t}=A)$.
For each $Var$, there is a frame constraint, where $As$ is the set of actions that include $Var$ in the effect of the action:
$$Va{r}_{t+1}=Va{r}_{t}\leftarrow Actio{n}_{t}\notin As$$ |
which specifies that the feature $Var$ has the same value before and after any action that does not affect $Var$.
$RLo{c}_{i}$ – Rob’s location $RH{C}_{i}$ – Rob has coffee $SW{C}_{i}$ – Sam wants coffee $M{W}_{i}$ – Mail is waiting $RH{M}_{i}$ – Rob has mail $Actio{n}_{i}$ – Rob’s action |
Figure 6.4 shows a CSP representation of the delivery robot example, with a planning horizon of ${k}{\mathrm{=}}{\mathrm{2}}$. There are three copies of the state variables: one at time 0, the initial state; one at time 1; and one at time 2, the final state. There are action variables for times 0 and 1.
Precondition constraints: The constraints to the left of the action variable for each time are the precondition constraints. There is a separate constraint for each element of the precondition of the action.
The precondition for the action deliver coffee, ${d}{\mathit{}}{c}$, is ${\mathrm{\{}}{R}{\mathit{}}{L}{\mathit{}}{o}{\mathit{}}{c}{\mathrm{=}}{o}{\mathit{}}{f}{\mathit{}}{f}{\mathrm{,}}{r}{\mathit{}}{h}{\mathit{}}{c}{\mathrm{\}}}$; the robot has to be in the office and it must have coffee. Thus there are two precondition constraints for delivers coffee:
${R}{}{L}{}{o}{}{{c}}_{{t}}{=}{o}{}{f}{}{f}{}{i}{}{c}{}{e}{\leftarrow}{A}{}{c}{}{t}{}{i}{}{o}{}{{n}}_{{t}}{=}{d}{}{c}$ | ||
${R}{}{H}{}{{C}}_{{t}}{=}{t}{}{r}{}{u}{}{e}{\leftarrow}{A}{}{c}{}{t}{}{i}{}{o}{}{{n}}_{{t}}{=}{d}{}{c}$ |
Effect constraints: The effect of delivering coffee (${\mathrm{d}}{\mathrm{}}{\mathrm{c}}$) is ${\mathrm{\{}}{\mathrm{\neg}}{\mathrm{}}{\mathrm{r}}{\mathrm{}}{\mathrm{h}}{\mathrm{}}{\mathrm{c}}{\mathrm{,}}{\mathrm{\neg}}{\mathrm{}}{\mathrm{s}}{\mathrm{}}{\mathrm{w}}{\mathrm{}}{\mathrm{c}}{\mathrm{\}}}$. Therefore there are two effect constraints
${R}{}{H}{}{{C}}_{{t}{+}{1}}{=}{f}{}{a}{}{l}{}{s}{}{e}{\leftarrow}{A}{}{c}{}{t}{}{i}{}{o}{}{{n}}_{{t}}{=}{d}{}{c}$ | ||
${S}{}{W}{}{{C}}_{{t}{+}{1}}{=}{o}{}{f}{}{f}{}{i}{}{c}{}{e}{\leftarrow}{A}{}{c}{}{t}{}{i}{}{o}{}{{n}}_{{t}}{=}{d}{}{c}$ |
Frame constraints: Rob has mail (${\mathrm{r}}{\mathrm{}}{\mathrm{h}}{\mathrm{}}{\mathrm{m}}$) is not one of the effects of delivering coffee (${\mathrm{d}}{\mathrm{}}{\mathrm{c}}$). Thus there is a frame constraint
${R}{}{H}{}{{M}}_{{t}{+}{1}}{=}{R}{}{H}{}{{M}}_{{t}}{\leftarrow}{A}{}{c}{}{{t}}_{{t}}{=}{d}{}{c}$ |
which is violated when ${R}{\mathit{}}{H}{\mathit{}}{{M}}_{{t}{\mathrm{+}}{\mathrm{1}}}{\mathrm{\ne}}{R}{\mathit{}}{H}{\mathit{}}{{M}}_{{t}}{\mathrm{\wedge}}{A}{\mathit{}}{c}{\mathit{}}{{t}}_{{t}}{\mathrm{=}}{d}{\mathit{}}{c}$.
Consider finding a plan to get Sam coffee, where initially, Sam wants coffee but the robot does not have coffee. This can be represented as initial-state constraints: ${S}{\mathit{}}{W}{\mathit{}}{{C}}_{{\mathrm{0}}}{\mathrm{=}}{t}{\mathit{}}{r}{\mathit{}}{u}{\mathit{}}{e}$ and ${R}{\mathit{}}{H}{\mathit{}}{{C}}_{{\mathrm{0}}}{\mathrm{=}}{f}{\mathit{}}{a}{\mathit{}}{l}{\mathit{}}{s}{\mathit{}}{e}$.
With a planning horizon of 2, the goal is represented as the domain constraint ${S}{\mathit{}}{W}{\mathit{}}{{C}}_{{\mathrm{2}}}{\mathrm{=}}{f}{\mathit{}}{a}{\mathit{}}{l}{\mathit{}}{s}{\mathit{}}{e}$, and there is no solution.
With a planning horizon of 3, the goal is represented as the domain constraint ${S}{\mathit{}}{W}{\mathit{}}{{C}}_{{\mathrm{3}}}{\mathrm{=}}{f}{\mathit{}}{a}{\mathit{}}{l}{\mathit{}}{s}{\mathit{}}{e}$. This has many solutions, all with ${R}{\mathit{}}{L}{\mathit{}}{o}{\mathit{}}{{c}}_{{\mathrm{0}}}{\mathrm{=}}{c}{\mathit{}}{s}$ (the robot has to start in the coffee shop), ${A}{\mathit{}}{c}{\mathit{}}{t}{\mathit{}}{i}{\mathit{}}{o}{\mathit{}}{{n}}_{{\mathrm{0}}}{\mathrm{=}}{p}{\mathit{}}{u}{\mathit{}}{c}$ (the robot has to pick up coffee initially), ${A}{\mathit{}}{c}{\mathit{}}{t}{\mathit{}}{i}{\mathit{}}{o}{\mathit{}}{{n}}_{{\mathrm{1}}}{\mathrm{=}}{m}{\mathit{}}{c}$ (the robot has to move to the office), and ${A}{\mathit{}}{c}{\mathit{}}{t}{\mathit{}}{i}{\mathit{}}{o}{\mathit{}}{{n}}_{{\mathrm{2}}}{\mathrm{=}}{d}{\mathit{}}{c}$ (the robot has to deliver coffee at time 2).
The CSP representation assumes a fixed planning horizon (i.e., a fixed number of steps). To find a plan over any number of steps, the algorithm can be run for a horizon of $k=0$, $1$, $2$,…until a solution is found. For the stochastic local search algorithm, it is possible to search multiple horizons at once, searching for all horizons, $k$ from 0 to $n$, and allowing $n$ to vary slowly. When solving the CSP using arc consistency and domain splitting, it is sometimes possible to determine that trying a longer plan will not help. That is, by analyzing why no solution exists for a horizon of $n$ steps, it may be possible to show that there can be no plan for any length greater than $n$. This will enable the planner to halt when there is no plan. See Exercise 11.