### 2.2.1 The Agent Function

Agents are situated in time: they receive sensory data in time and do actions in time. The action that an agent does at a particular time is a function of its inputs. We first consider the notion of time.

Let *T* be the set of **time** points. Assume that *T* is
totally ordered and has some metric that can be used to measure the
temporal distance between any two time points. Basically, we assume
that *T* can be mapped to some subset of the real line.

*T* is **discrete** if there exist only
a finite number of time points between any two time points; for example,
there is a time point every hundredth of a second, or every day, or
there may be time points whenever interesting events occur. *T* is
**dense** if there is always another time point between any two time
points; this implies there must be infinitely many time points between
any two points. Discrete time has the property that, for all times,
except perhaps a last time, there is always a next time. Dense time
does not have a "next time."
Initially,
we assume that time is discrete and goes on forever. Thus, for each
time there is a next time. We write *t+1* to be the next time
after time *t*; it does not mean that the time points are equally spaced.
Assume that *T* has a starting point, which we arbitrarily
call *0*.

Suppose *P* is the set of all possible
percepts.
A **percept trace**, or **percept stream**, is a
function from *T* into *P*. It specifies what is observed at each time.

Suppose *C* is the set of all commands. A
**command trace** is a function
from *T* into *C*. It specifies the command for each time point.

**Example 2.1:**Consider a household trading agent that monitors the price of some commodity (e.g., it checks online for special deals and for price increases for toilet paper) and how much the household has in stock. It must decide whether to buy more and how much to buy. The percepts are the price and the amount in stock. The command is the number of units the agent decides to buy (which is zero if the agent does not buy any). A percept trace specifies for each time point (e.g., each day) the price at that time and the amount in stock at that time. Percept traces are given in Figure 2.2. A command trace specifies how much the agent decides to buy at each time point. An example command trace is given in Figure 2.3.

The action of actually buying depends on the command but may be different. For example, the agent could issue a command to buy 12 rolls of toilet paper at a particular price. This does not mean that the agent actually buys 12 rolls because there could be communication problems, the store could have run out of toilet paper, or the price could change between deciding to buy and actually buying.

A percept trace for an agent is thus the sequence of all past, present, and
future percepts received by the controller. A command trace is the sequence of all past, present, and
future commands issued by the controller. The commands can be a
function of the history of percepts. This gives rise to the concept of
a **transduction**,
a function that maps percept
traces into command traces.

Because all agents are
situated in time, an agent cannot actually observe full percept traces; at any
time it has only experienced the part of the trace up to *now*. It can
only observe the value of the trace at time *t∈T* when it gets to
time *t*. Its command can only depend on what it has experienced.

A transduction is **causal** if, for all times *t*, the command
at time *t* depends only on percepts up to and including time *t*. The causality
restriction is needed because agents are situated in time; their
command at time *t* cannot depend on percepts after time *t*.
A **controller** is an implementation of a
causal transduction.

The **history** of an agent at time *t* is the
percept trace of the agent for all times before or at time *t* and the command trace of the agent before time *t*.

Thus, a **causal transduction** specifies a function from the agent's history at
time *t* into the command at time *t*. It can
be seen as the most general specification of an agent.

**Example 2.2:**Continuing Example 2.1, a causal transduction specifies, for each time, how much of the commodity the agent should buy depending on the price history, the history of how much of the commodity is in stock (including the current price and amount in stock) and the past history of buying.

An example of a causal transduction is as follows: buy four dozen rolls if there are fewer than five dozen in stock and the price is less than 90% of the average price over the last 20 days; buy a dozen more rolls if there are fewer than a dozen in stock; otherwise, do not buy any.

Although a causal transduction is a function of an agent's history, it cannot be directly implemented because an agent does not have direct access to its entire history. It has access only to its current percepts and what it has remembered.

The **belief state** of an agent at
time *t* is all of the information the agent has remembered from the
previous times. An agent has access only to its history that it has
encoded in its belief state. Thus, the belief state encapsulates all of the information
about its history that the agent can use for current and future commands.
At any time, an agent has access to its belief state and its percepts.

The belief state can contain any information, subject to the agent's memory and processing limitations. This is a very general notion of belief; sometimes we use a more specific notion of belief, such as the agent's belief about what is true in the world, the agent's beliefs about the dynamics of the environment, or the agent's belief about what it will do in the future.

Some instances of belief state include the following:

- The belief state for an agent that is following a fixed sequence of instructions may be a program counter that records its current position in the sequence.
- The belief state can contain specific facts that are useful - for example, where the delivery robot left the parcel in order to go and get the key, or where it has already checked for the key. It may be useful for the agent to remember anything that is reasonably stable and that cannot be immediately observed.
- The belief state could encode a model or a partial model of the state of the world. An agent could maintain its best guess about the current state of the world or could have a probability distribution over possible world states; see Section 5.6 and Chapter 6.
- The belief state could be a representation of the dynamics of the world and the meaning of its percepts, and the agent could use its perception to determine what is true in the world.
- The belief state could encode what the agent
**desires**, the**goals**it still has to achieve, its**beliefs**about the state of the world, and its**intentions**, or the steps it intends to take to achieve its goals. These can be maintained as the agent acts and observes the world, for example, removing achieved goals and replacing intentions when more appropriate steps are found.

A controller must maintain the agent's belief state and determine what command to issue at each time. The information it has available when it must do this includes its belief state and its current percepts.

A **belief state transition function** for discrete time is
a function

remember:S×P →S

where *S* is the set of
belief states and *P* is the set of possible percepts;
*s _{t+1}=remember(s_{t},p_{t})* means that

*s*is the belief state following belief state

_{t+1}*s*when

_{t}*p*is observed.

_{t}A **command function** is a function

do:S×P →C

where *S* is the set of belief states, *P* is
the set of possible percepts, and *C* is the set of possible commands;
*c _{t}=do(s_{t},p_{t})* means that the controller issues command

*c*when the belief state is

_{t}*s*and when

_{t}*p*is observed.

_{t}The belief-state transition function and the command function together specify a causal transduction for the agent. Note that a causal transduction is a function of the agent's history, which the agent doesn't necessarily have access to, but a command function is a function of the agent's belief state and percepts, which it does have access to.

**Example 2.3:**To implement the causal transduction of Example 2.2, a controller must keep track of the rolling history of the prices for the previous 20 days. By keeping track of the average (

*ave*), it can update the average using

ave ←ave + (new-old)/(20)

where *new* is the new price and *old* is the oldest price
remembered. It can then discard *old*. It must do something
special for the first 20 days.

A simpler controller could, instead of remembering a rolling history
in order to maintain the average, remember just the average and use
the average as a surrogate for the oldest item. The belief state can
then contain one real number (*ave*). The state transition function to update
the average could be

ave ←ave + (new-ave)/(20)

This controller is much easier to implement and is not sensitive to what happened 20 time units ago. This way of maintaining estimates of averages is the basis for temporal differences in reinforcement learning.

If there exists a finite number of possible belief states, the controller
is called a **finite state controller** or a **finite state machine**. A **factored
representation** is one in which the belief states, percepts, or commands are defined by
features. If there exists a finite number of features, and each feature
can only have a finite number of possible values, the controller is a **factored
finite state machine**.
Richer controllers can be built using an unbounded number of values or
an unbounded number of features. A controller that has countably many
states can compute anything that is computable by a Turing machine.