foundations of computational agents
Regression planning involves searching backwards from a goal, asking the question: what does the agent need to do to achieve this goal, and what needs to hold to enable this action to solve the goal? What needs to hold before the action becomes a subgoal, which either holds initially or becomes a new goal to solve.
If the goal is for the robot to hold coffee, and the robot isn’t already holding coffee, then the action that achieves this is to pick up coffee. This action requires the robot to be in the coffee shop, which becomes a new subgoal.
If the goal, instead, is for the robot to be holding coffee in the office, then the actions to achieve that involve moving to the office from a neighboring location, while holding coffee. This results in the subgoal of being in a neighboring location holding coffee. Picking up coffee cannot achieve the goal of holding coffee in the office, because picking up coffee can only be done in the coffee shop, and so the result of picking up coffee is that the robot is in the coffee shop, not the office.
Regression planning is searching in the graph defined by the following:
The nodes are subgoals, where a subgoal is an assignment to some of the features.
The arcs correspond to actions. In particular, an arc from node $g$ to ${g}^{\prime}$, labeled with action $act$, means
$act$ is the last action that is carried out before subgoal $g$ is achieved
node ${g}^{\prime}$ is a subgoal that must be true immediately before $act$ so that $g$ is true immediately after $act$.
The start node is the planning goal to be achieved.
The goal condition for the search, $goal(g)$, is true if $g$ is true of the initial state.
Given a node that represents subgoal $g$, a neighbor of $g$ exists for every action $act$ such that:
$act$ is possible, it is possible for $act$ to be carried out and for $g$ to be true immediately after $act$
$act$ is useful, $act$ achieves part of $g$.
Consider the subgoal $g=\{{X}_{1}={v}_{1},\mathrm{\dots},{X}_{n}={v}_{n}\}$. In terms of the STRIPS representation, $act$ is useful for solving $g$ if, for some $i$, ${X}_{i}={v}_{i}$ is an effect of action $act$. Immediately before $act$, the preconditions of $act$, as well as any ${X}_{k}={v}_{k}$ not achieved by $act$, must hold. Thus, the neighbor of subgoal $g$ on the arc labeled $act$ is the subgoal $precondition(act)\cup (g\setminus effects(act))$, as long as $act$ is possible. Action $act$ is possible if:
for each ${X}_{j}={v}_{j}$ in $g$, there is no effect ${X}_{j}={v}_{j}$ of $act$ where ${v}_{j}^{\prime}\ne {v}_{j}$
$precondition(act)\cup (g\setminus effects(act))$ does not include two different assignments to any feature.
Suppose the goal is to ensure that Sam does not want coffee, which is $\neg swc$. Therefore, the start node is $\{\neg swc\}$. If this is true in the initial state, the planner stops. If not, it chooses an action that achieves $\neg swc$. In this case, there is only one such action: $dc$ (deliver coffee). The precondition of $dc$ is $\{off,rhc\}$. Thus, there is one arc:
$$ |
Consider the node $\{off,rhc\}$. There are two actions that can achieve $off$, namely $mc$ from $cs$ and $mcc$ from $lab$. There is one action that can achieve $rhc$, namely $puc$. However, $puc$ has as precondition $\{cs,\neg rhc\}$, but $cs$ and $off$ are inconsistent (because they involve different assignments to the variable $RLoc$). Therefore, $puc$ is not a possible last action; it is not possible that, immediately after $puc$, the condition $\{off,rhc\}$ holds.
Figure 6.3 shows the first three levels of the search space (without cycle pruning or multiple-path pruning). Note that the search space is the same no matter what the initial state is. The starting state has two roles: to serve as a stopping criterion and as a source of heuristics.
If a subgoal contains a number of assignments, regression can often determine which of the assignments to achieve last, as in the following example.
Suppose the goal is for Sam to not want coffee and for the robot to have coffee: $\{\neg swc,rhc\}$. The last action cannot be $dc$ to achieve $\neg swc$, because this achieves $\neg rhc$. The only last action must be $puc$ to achieve $rhc$. Thus, the resulting subgoal is $\{\neg swc,cs\}$. Again, the last action before this subgoal cannot be to achieve $\neg swc$ because this has as a precondition $off$, which is inconsistent with $cs$. Therefore, the second-to-last action must be a move action to achieve $cs$.
In terms of the feature-based representation of actions, an action $act$ is useful if there is a causal rule that achieves ${X}_{i}={v}_{i}$ for some $i$, using action $act$. The neighbor of this node along the arc labeled with action $act$ is the proposition
$$precondition(act)\wedge body({X}_{1}={v}_{1},act)\wedge \mathrm{\cdots}\wedge body({X}_{n}={v}_{n},act)$$ |
where $body({X}_{i}={v}_{i},act)$ is the set of assignments of variables in the body of a rule that specifies when ${X}_{i}={v}_{i}$ is true immediately after $act$. There is no such neighbor if there is no corresponding rule for some $i$, or if the proposition is inconsistent (i.e., assigns different values to a variable). Note that, if multiple rules are applicable for the same action, there will be multiple neighbors.
Search algorithms such as ${A}^{\ast}$ and branch and bound can make use of heuristic knowledge. For a regression planner, the heuristic value of a node is an estimate of the cost to achieve the subgoal represented by the node from the initial state. This form of heuristic as an estimate of the cost of achieving a subgoal from a state is the same as used in a forward planner. So, for example, the heuristic of Example 6.10 could also be used for a regression planner. However, an effective heuristic for a regression planner may not be very useful for a forward planner, and vice versa (see Exercise 6.4).
One problem that arises in regression planning is that a subgoal may not be achievable. Deciding whether a subgoal is achievable is often difficult to infer from the definitions of the actions. For example, consider the restriction that an object cannot be at two different places at the same time; sometimes this is not explicitly represented and is only implicit in the effects of an action, and the fact that the object is only in one position initially. It is possible to have domain knowledge to prune nodes that can be shown to be inconsistent.
Cycle pruning and multiple-path pruning may be incorporated into a regression planner. The regression planner does not have to visit exactly the same node to prune the search. If the subgoal represented by a node $n$ is a superset of a subgoal on the path to $n$, node $n$ can be pruned. Similarly for multiple-path pruning, see Exercise 6.9.