Artificial Intelligence - foundations of computational agents -- 6.5.1 Markov Chains

Third edition of Artificial Intelligence: foundations of computational agents, Cambridge University Press, 2023 is now available (including the full text).

6.5.1 Markov Chains

A Markov chain is a special sort of belief network used to represent sequences of values, such as the sequence of states in a dynamic system or the sequence of words in a sentence.

Figure 6.13: A Markov chain as a belief network

Figure 6.13 shows a generic Markov chain as a belief network. The network does not have to stop at stage s₄, but it can be extended indefinitely. The belief network conveys the independence assumption

P(S_t+1|S₀,...,S_t)=P(S_t+1|S_t),

which is called the Markov assumption.

Often, S_t represents the state at time t. Intuitively, S_t conveys all of the information about the history that can affect the future states. At S_t, you can see that "the future is conditionally independent of the past given the present."

A Markov chain is stationary if the transition probabilities are the same for each time point [i.e., for all t>0, t'>0, P(S_t+1|S_t)=P(S_t'+1|S_t')]. To specify a stationary Markov chain, two conditional probabilities must be specified:

P(S₀) specifies initial conditions.
P(S_t+1|S_t) specifies the dynamics, which is the same for each t ≥ 0.

Stationary Markov chains are of interest because

They provide a simple model that is easy to specify.
The assumption of stationarity is often the natural model, because the dynamics of the world typically does not change in time. If the dynamics does change in time, it is usually because some other feature exists that could also be modeled.
The network can extend indefinitely. Specifying a small number of parameters can give an infinite network. You can ask queries or make observations about any arbitrary points in the future or the past.

To determine the probability distribution of state S_i, VE can be used to sum out the preceding variables. Note that the variables after S_i are irrelevant to the probability of S_i and need not be considered. This computation is normally specified as matrix multiplication, but note that matrix multiplication is a simple form of VE. Similarly, to compute P(S_i|S_k), where k>i, only the variables before S_k need to be considered.