### 3.3.1 Formalizing Graph Searching

A directed **graph** consists of

In this definition, a node can be anything. All this definition does is constrain arcs to be ordered pairs of nodes. There can be infinitely many nodes and arcs. We do not assume that the graph is represented explicitly; we require only a procedure that can generate nodes and arcs as needed.

The arc *⟨n _{1},n_{2}⟩* is an

**outgoing arc**from

*n*and an

_{1}**incoming arc**to

*n*.

_{2}A node *n _{2}* is a

**neighbor**of

*n*if there is an arc from

_{1}*n*to

_{1}*n*; that is, if

_{2}*⟨n*. Note that being a neighbor does not imply symmetry; just because

_{1},n_{2}⟩∈A*n*is a neighbor of

_{2}*n*does not mean that

_{1}*n*is necessarily a neighbor of

_{1}*n*. Arcs may be

_{2}**labeled**, for example, with the action that will take the agent from one state to another.

A **path** from node *s* to node *g* is a
sequence of nodes *⟨n _{0}, n_{1},..., n_{k}⟩* such that

*s=n*,

_{0}*g=n*, and

_{k}*⟨n*; that is, there is an arc from

_{i-1},n_{i}⟩∈A*n*to

_{i-1}*n*for each

_{i}*i*. Sometimes it is useful to view a path as the sequence of arcs,

*⟨n*

_{o},n_{1}⟩,*⟨n*

_{1},n_{2}⟩,...,*⟨n*, or a sequence of labels of these arcs.

_{k-1},n_{k}⟩A **cycle** is a nonempty path such that the end node is the
same as the start node - that is, a cycle is a path *⟨n _{0}, n_{1},..., n_{k}⟩* such that

*n*and

_{0}=n_{k}*k≠0*. A directed graph without any cycles is called a

**directed acyclic graph**(

**DAG**). This should probably be an

**acyclic directed graph**, because it is a directed graph that happens to be acyclic, not an acyclic graph that happens to be directed, but DAG sounds better than ADG!

A **tree** is a DAG where there is one node with no incoming arcs and
every other node has exactly one incoming arc. The node with no
incoming arcs is called the **root** of the tree and nodes with
no outgoing arcs are called **leaves**.

To encode problems as graphs, one set of nodes is referred to
as the **start nodes** and
another set is called the **goal nodes**. A **solution** is a path from a start node to a goal node.

Sometimes there is a **cost** - a
positive number - associated with arcs. We write the cost of arc
*⟨n _{i},n_{j}⟩* as

*cost(⟨n*. The costs of arcs induces a cost of paths.

_{i},n_{j}⟩)Given a path *p = ⟨n _{0},
n_{1},..., n_{k}⟩*, the cost of path

*p*is the sum of the costs of the arcs in the path:

cost(p) = cost(⟨n_{0},n_{1}⟩) + ...+ cost(⟨n_{k-1},n_{k}⟩)

An **optimal solution** is one of the least-cost solutions; that
is, it is a path *p* from a start node to a goal node such that there is
no path *p'* from a start node to a goal node where *cost(p')<cost(p)*.

**Example 3.4:**Consider the problem of the delivery robot finding a path from location

*o103*to location

*r123*in the domain depicted in Figure 3.1. In this figure, the interesting locations are named. For simplicity, we consider only the locations written in bold and we initially limit the directions that the robot can travel. Figure 3.2 shows the resulting graph where the nodes represent locations and the arcs represent possible single steps between locations. In this figure, each arc is shown with the associated cost of getting from one location to the next.

In this graph, the nodes are *N={mail,ts,o103,b3,o109,...}* and
the arcs are *A={⟨ts,mail⟩, ⟨o103,ts⟩, ⟨o103,b3⟩,
⟨o103,o109⟩, ...}*. Node *o125* has no neighbors.
Node *ts* has one neighbor, namely *mail*. Node *o103* has
three neighbors, namely *ts*, *b3*, and *o109*.

There are three paths from *o103* to *r123*:

⟨o103, o109, o119, o123, r123⟩ ⟨o103, b3, b4, o109, o119, o123, r123⟩ ⟨o103, b3, b1, b2, b4, o109, o119, o123, r123⟩

If *o103* were a start node and *r123* were a goal node, each of these three
paths would be a solution to the graph-searching problem.

In many problems the search graph is not given explicitly; it is dynamically constructed as needed. All that is required for the search algorithms that follow is a way to generate the neighbors of a node and to determine if a node is a goal node.

The **forward branching factor** of a node
is the number of arcs leaving the node. The **backward
branching factor** of a node is the number of arcs entering the node.
These factors provide measures of the complexity of graphs. When we discuss
the time and space complexity of the search algorithms, we assume that
the branching factors are bounded from above by a constant.

**Example 3.5:**In the graph of Figure 3.2, the forward branching factor of node

*o103*is three; there are three arcs coming out of node

*o103*. The backward branching factor of node

*o103*is zero; there are no arcs coming into node

*o103*. The forward branching factor of

*b3*is two and the backward branching factor of

*b3*is one.

The branching factor is important because it is a key component in
the size of the graph. If the forward branching factor for each node
is *b*, and the graph is a tree, there are *b ^{n}* nodes that are

*n*arcs away from any node.