foundations of computational agents
Dimensions: flat, features, indefinite horizon, fully observable, deterministic, goal directed, non-learning, single agent, offline, perfect rationality
The forward and regression planners enforce a total ordering on actions at all stages of the planning process. The CSP planner commits to the particular time that the action will be carried out. This means that those planners have to commit to an ordering of actions when adding them to a partial plan, even if there is no particular reason to put one action before another.
A partial-order planner maintains a partial ordering between actions and only commits to an ordering between actions when forced. This is sometimes also called a non-linear planner, which is a misnomer because such planners often produce a linear plan.
Because the same action may be used a number of times in the same plan, for example, the robot may need to move clockwise a number of times, the partial ordering will be between action instances, where an action instance is just a pair of an action and an integer, which we will write as $act\mathrm{\#}i$. By the preconditions and effects of the action instance, we mean the precondition and the effects of the action.
A partial ordering is a binary relation that is transitive and asymmetric. A partial-order plan is a set of action instances together with a partial ordering between them, representing a “before” relation on action instances. Write $$ if action instance $ac{t}_{0}$ is before action instance $ac{t}_{1}$ in the partial order. This means that the action of $ac{t}_{0}$ must occur before the action of $ac{t}_{1}$. The aim of the planner is to produce a partial ordering of the action instances so that any total ordering consistent with the partial ordering will solve the goal from the initial state.
There are two special action instances, $start$, that achieves the relations that are true in the initial state, and $finish$, whose precondition is the goal to be solved. Every other action instance is after $start$ and before $finish$ in the partial ordering. The use of these as action instances means that the algorithm does not require special cases for the initial situation and for the goals. When the preconditions of $finish$ are achieved, the goal is solved.
Any action instance, other than $start$ or $finish$, will be in a partial-order plan to achieve a precondition of an action instance in the plan. Each precondition $P$ of an action instance $ac{t}_{1}$ in the plan is either true in the initial state, and so achieved by $start$, or there will be an action instance $ac{t}_{0}$ in the plan that achieves $P$. The action instance $ac{t}_{0}$ that achieves $P$ must be before $ac{t}_{1}$; that is, $$. To be correct, the algorithm must also ensure that nothing makes $P$ false in between $ac{t}_{0}$ and $ac{t}_{1}$.
A causal link is a triple $$, where $ac{t}_{0}$ and $ac{t}_{1}$ are action instances and $P$ is a $Var=val$ assignment that is in the precondition of $ac{t}_{1}$, and in the effect of $ac{t}_{0}$. This means that $ac{t}_{0}$ makes $P$ hold for $ac{t}_{1}$. With this causal link, any other action instance that makes $P$ false must either be before $ac{t}_{0}$ or after $ac{t}_{1}$.
Informally, a partial-order planner works as follows. Begin with the action instances $start$ and $finish$ and the partial order $$. The planner maintains an agenda that is a set of $$ pairs, where $A$ is an action instance in the plan and $P$ is a variable-value assignment that is a precondition of $A$ that remains to be achieved. Initially, the agenda contains pairs $$, where $G$ is an assignment that must be true in the goal state.
At each stage in the planning process, a pair $$ is chosen from the agenda, where $P$ is in the precondition for action instance $ac{t}_{1}$. Then an action instance, $ac{t}_{0}$, is chosen to achieve $P$. That action instance is either already in the plan – it could be the $start$ action, for example – or it is a new action instance that is added to the plan. Action instance $ac{t}_{0}$ must happen before $ac{t}_{1}$ in the partial order. The planner adds a causal link that records that $ac{t}_{0}$ achieves $P$ for action $ac{t}_{1}$. Any action in the plan that makes $P$ false must happen either before $ac{t}_{0}$ or after $ac{t}_{1}$. If $ac{t}_{0}$ is a new action, its preconditions are added to the agenda, and the process continues until the agenda is empty.
The algorithm $Partial\mathrm{\_}order\mathrm{\_}planner$ is given in Figure 6.6. This is a non-deterministic procedure. The “choose” and the “either …or …” form choices that must be searched over. There are two choices that require search:
which action is chosen to achieve $P$ and
whether an action instance that deletes $P$ happens before $ac{t}_{0}$ or after $ac{t}_{1}$.
The function $$ returns the constraints formed by adding the constraint $$ to $Constraints$, and it fails if $$ is incompatible with $Constraints$. There are many ways this function can be implemented. See Exercise 12.
The function $$ checks whether $A\ne {A}_{0}$, $A\ne {A}_{1}$ and the effect of $A$ is inconsistent with $G$. If so, either $$ is added to the set of constraints or $$ is added to the set of constraints. This is a non-deterministic choice that is searched over.
Consider the goal of Sam not wanting coffee and no mail waiting (i.e., ${\mathrm{\neg}}{\mathit{}}{s}{\mathit{}}{w}{\mathit{}}{c}{\mathrm{\wedge}}{\mathrm{\neg}}{\mathit{}}{m}{\mathit{}}{w}$), where in the initial state Rob is in the lab, Sam wants coffee, Rob does not have coffee, there is mail waiting and Rob does not have mail, i.e., ${R}{\mathit{}}{L}{\mathit{}}{o}{\mathit{}}{c}{\mathrm{=}}{l}{\mathit{}}{a}{\mathit{}}{b}$, ${s}{\mathit{}}{w}{\mathit{}}{c}$, ${\mathrm{\neg}}{\mathit{}}{r}{\mathit{}}{h}{\mathit{}}{c}$, ${m}{\mathit{}}{w}$, ${\mathrm{\neg}}{\mathit{}}{r}{\mathit{}}{h}{\mathit{}}{m}$.
We will write instances of action ${A}{\mathit{}}{c}{\mathit{}}{t}$ as ${A}{\mathit{}}{c}{\mathit{}}{t}{\mathit{}}{\mathrm{\#}}{\mathit{}}{n}$ where ${n}$ is a unique integer.
Initially the agenda is
$$ |
Suppose $$ is chosen and removed from the agenda. One action can achieve ${\mathrm{\neg}}{\mathit{}}{s}{\mathit{}}{w}{\mathit{}}{c}$, namely deliver coffee, ${d}{\mathit{}}{c}$, with preconditions ${o}{\mathit{}}{f}{\mathit{}}{f}$ and ${r}{\mathit{}}{h}{\mathit{}}{c}$. So it inserts an instance, say ${d}{\mathit{}}{c}{\mathit{}}{\mathrm{\#}}{\mathit{}}{\mathrm{6}}$, into the plan. After the first time through the repeat loop, ${A}{\mathit{}}{g}{\mathit{}}{e}{\mathit{}}{n}{\mathit{}}{d}{\mathit{}}{a}$ contains
$$ |
At this stage, the value of ${C}{\mathit{}}{o}{\mathit{}}{n}{\mathit{}}{s}{\mathit{}}{t}{\mathit{}}{r}{\mathit{}}{a}{\mathit{}}{i}{\mathit{}}{n}{\mathit{}}{t}{\mathit{}}{s}$ is $$. There is one causal link, $$. This causal link means that no action that undoes ${\mathrm{\neg}}{\mathit{}}{s}{\mathit{}}{w}{\mathit{}}{c}$ is allowed to happen between ${d}{\mathit{}}{c}{\mathit{}}{\mathrm{\#}}{\mathit{}}{\mathrm{6}}$ and ${f}{\mathit{}}{i}{\mathit{}}{n}{\mathit{}}{i}{\mathit{}}{s}{\mathit{}}{h}$.
Suppose $$ is chosen from the agenda. One action can achieve this, ${p}{\mathit{}}{u}{\mathit{}}{m}$, with precondition ${\mathrm{\{}}{m}{\mathit{}}{w}{\mathrm{,}}{R}{\mathit{}}{L}{\mathit{}}{o}{\mathit{}}{c}{\mathrm{=}}{m}{\mathit{}}{r}{\mathrm{\}}}$. The algorithm constructs a new action instance, say ${p}{\mathit{}}{u}{\mathit{}}{m}{\mathit{}}{\mathrm{\#}}{\mathit{}}{\mathrm{7}}$. The causal link $$ is added to the set of causal links; $$ and $$ are added to the agenda.
Suppose $$ is chosen from the agenda. The action ${s}{\mathit{}}{t}{\mathit{}}{a}{\mathit{}}{r}{\mathit{}}{t}$ achieves ${m}{\mathit{}}{w}$, because ${m}{\mathit{}}{w}$ is true initially. The causal link $$ is added to the set of causal links. Nothing is added to the agenda.
At this stage, there is no ordering imposed between ${d}{\mathit{}}{c}{\mathit{}}{\mathrm{\#}}{\mathit{}}{\mathrm{6}}$ and ${p}{\mathit{}}{u}{\mathit{}}{m}{\mathit{}}{\mathrm{\#}}{\mathit{}}{\mathrm{7}}$.
Suppose $$ is removed from the agenda. There are two actions that can achieve ${o}{\mathit{}}{f}{\mathit{}}{f}$: ${m}{\mathit{}}{c}{\mathit{}}{\mathrm{\_}}{\mathit{}}{c}{\mathit{}}{s}$ with preconditions ${c}{\mathit{}}{s}$, and ${m}{\mathit{}}{c}{\mathit{}}{c}{\mathit{}}{\mathrm{\_}}{\mathit{}}{l}{\mathit{}}{a}{\mathit{}}{b}$ with preconditions ${l}{\mathit{}}{a}{\mathit{}}{b}$. The algorithm searches over these choices. Suppose it chooses the action instance ${m}{\mathit{}}{c}{\mathit{}}{\mathrm{\_}}{\mathit{}}{c}{\mathit{}}{s}{\mathit{}}{\mathrm{\#}}{\mathit{}}{\mathrm{9}}$. The causal link $$ is added.
The first violation of a causal link occurs when a move action is used to achieve $$. This action violates the causal link $$, and so must happen after ${d}{\mathit{}}{c}{\mathit{}}{\mathrm{\#}}{\mathit{}}{\mathrm{6}}$ (the robot goes to the mail room after delivering coffee) or before ${m}{\mathit{}}{c}{\mathit{}}{\mathrm{\_}}{\mathit{}}{c}{\mathit{}}{s}{\mathit{}}{\mathrm{\#}}{\mathit{}}{\mathrm{9}}$.
Eventually, it finds a plan of action instances, such as:
$${s}{}{t}{}{a}{}{r}{}{t}{;}{m}{}{c}{}{\mathrm{\_}}{}{l}{}{a}{}{b}{}{\mathrm{\#}}{}{15}{;}{p}{}{u}{}{m}{}{\mathrm{\#}}{}{7}{;}{m}{}{c}{}{\mathrm{\_}}{}{m}{}{r}{}{\mathrm{\#}}{}{40}{;}{p}{}{u}{}{c}{}{\mathrm{\#}}{}{11}{;}{m}{}{c}{}{\mathrm{\_}}{}{c}{}{s}{}{\mathrm{\#}}{}{9}{;}{d}{}{c}{}{\mathrm{\#}}{}{6}{;}{f}{}{i}{}{n}{}{i}{}{s}{}{h}$$ |
This is the only total ordering consistent with the partial ordering.
A partial order planner works particularly well when no ordering of actions can achieve a goal, as it does not need to search over all permutations of the actions. It also works well when many orderings can solve the goal, in which case it can find a flexible plan for the robot.