foundations of computational agents
Dynamic programming is a general method for optimization that involves computing and storing partial solutions to problems. Solutions that have already been found can be retrieved rather than being recomputed. Dynamic programming algorithms are used throughout AI and computer science.
Dynamic programming can be used for finding paths in finite graphs by constructing a cost-to-goal function for nodes that gives the exact cost of a minimal-cost path from the node to a goal.
Let be the actual cost of a lowest-cost path from node to a goal; can be defined as
The general idea is to build a table offline of the value for each node. This is done by carrying out a lowest-cost-first search, with multiple-path pruning, from the goal nodes in the inverse graph, which is the graph with all arcs reversed. Rather than having a goal to search for, the dynamic programming algorithm records the cost_to_goal values for each node found. It uses the inverse graph to compute the costs from each node to the goal and not the costs from the goal to each node. In essence, dynamic programming works backward from the goal, building the lowest-cost paths to the goal from each node in the graph.
For the graph given in Figure 3.2, is a goal, so
Continuing with a lowest-cost-first search from :
At this stage the backward search halts. Notice that, if a node does not have a cost_to_goal value, there is no path to the goal from that node.
A policy is a specification of which arc to take from each node. An optimal policy is a policy such that the cost of following that policy is not worse than the cost of following any other policy. Given a cost_to_goal function, which is computed offline, a policy can be computed as follows: From node it should go to a neighbor that minimizes . This policy will take the agent from any node to a goal along a lowest-cost path.
Either this neighbor can be recorded for all nodes offline, and the mapping from node to node is provided to the agent for online action, or the cost_to_goal function is given to the agent and each neighbor can be computed online.
Dynamic programming takes time and space linear in the size of the graph to build the cost_to_goal table. Once the cost_to_goal function has been built, even if the policy has not been recorded, the time to determine which arc is optimal depends only on the number of neighbors for the node.
Given the cost-to-goal of Example 3.20 for the goal of getting to , if the agent is at , it compares (the cost of going via ) with (the cost of going straight to ) and can determine to go next to . It does not need to consider going to ts as it knows there is no path from ts to .
Dynamic programming has the advantage that it specifies what to do for every node, and so can be used given a goal for any starting position.
In route planning (see Example 3.1), we also might want to find a robust solution that can allow for dynamic online replanning by quickly adapting when the user deviates (intentionally or unintentionally) from the best route. It should be able to give the best route from the user’s current location and driving direction.
It is even possible to adapt for dynamic programming for cases where there is a known goal, and an initial starting position, but where the agent can deviate from the optimal path. One way this can be done is to carry out an search (with multiple-path pruning) from the destination to the current location, in the inverse graph. When the user deviates from an optimal route, the other paths to the goal have often been explored, or can be generated to find a path from the current location.
Dynamic programming can be used to construct heuristic functions which can be used for and branch-and-bound searches. One way to build a heuristic function is to simplify the problem (e.g., by leaving out some details) until the simplified problem is small enough. Dynamic programming can be used to find the cost of an optimal path to a goal in the simplified problem. This information forms a pattern database that can then be used as a heuristic for the original problem.
Dynamic programming is useful when
the goal nodes are explicit (the previous methods only assumed a function that recognizes goal nodes)
a lowest-cost path is needed
the graph is finite and small enough to be able to store the cost_to_goal value for each node
the goal does not change very often, and
the policy is used a number of times for each goal, so that the cost of generating the cost_to_goal values can be amortized over many instances of the problem.
The main problems with dynamic programming are that
it only works when the graph is finite and the table can be made small enough to fit into memory
an agent must recompute a policy for each different goal, and
the time and space required is linear in the size of the graph, where the graph size for finite graphs is typically exponential in the path length.