8.5.1 Markov Chains

A Markov chain is a belief network with random variables in a sequence, where each variable only directly depends on its predecessor in the sequence. Markov chains are used to represent sequences of values, such as the sequence of states in a dynamic system or the sequence of words in a sentence. Each point in the sequence is called stage.

Figure 8.11: A Markov chain as a belief network

Figure 8.11 shows a generic Markov chain as a belief network. The network has 5 stages, but does not have to stop at stage $4$ ; it can extend indefinitely. The belief network conveys the independence assumption

P(S_{i+1}\mid S_{0},\dots,S_{i})=P(S_{i+1}\mid S_{i}),

which is called the Markov assumption.

Often the sequences are in time and, $S_{t}$ represents the state at time $t$ . Intuitively, $S_{t}$ conveys all of the information about the history that could affect the future states. The independence assumption of the Markov chain can be seen as “the future is conditionally independent of the past given the present.”

A Markov chain is a stationary model or time-homogenous model if the variables all have the same domain, and the transition probabilities are the same for each stage, i.e.,

\mbox{for all }i\geq 0,~{}P(S_{i+1}\mid S_{i})=P(S_{1}\mid S_{0})

To specify a stationary Markov chain, two conditional probabilities are provided:

•

$P(S_{0})$ specifies the initial conditions
•

$P(S_{i+1}\mid S_{i})$ specifies the dynamics, which is the same for each $i\geq 0$ .

Stationary Markov chains are of interest for the following reasons:

•

They provide a simple model that is easy to specify.
•

The assumption of stationarity is often the natural model, because the dynamics of the world typically does not change in time. If the dynamics does change in time, it is usually because of some other feature that could also be modeled.
•

The network extends indefinitely. Specifying a small number of parameters gives an infinite network. You can ask queries or make observations about any arbitrary points in the future or the past.

Pagerank

Google’s initial search engine [Brin and Page, 1998] was based on Pagerank. Pagerank [Page et al., 1999] is a probability measure over web pages where the most influential web pages have the highest probability. It is based on a Markov chain of a random web surfer who starts on a random page, and with some probability $d$ picks a random page that is linked from the current page, and otherwise (if the current page has no outgoing links or with probability $1-d$ ) picks a page at random. The Markov chain is defined as follows:

•

The domain of $S_{i}$ is the set of all web pages.
•

$P(S_{0})$ is the uniform distribution of web pages: $P(S_{0}=p_{j})=1/N$ for each web page $p_{j}$ , where $N$ is the number of web pages.
•

The transition is defined as follows:

$\displaystyle P(S_{i+1}$ $\displaystyle=p_{j}\mid S_{i}=p_{k})$

$\displaystyle=(1-d)/N+d*\left\{\begin{array}[]{ll}1/n_{k}&\mbox{if $p_{k}$ % links to $p_{j}$}\\ 1/N&\mbox{if $p_{k}$ has no links}\\ 0&\mbox{otherwise}\end{array}\right.$

where there are $N$ web pages and there $n_{k}$ links on page $p_{k}$ . The way to think about this is that $p_{k}$ is the current web page, and $p_{j}$ is the next web page. If $p_{k}$ has no outgoing links, then $p_{j}$ is a page at random, which is the effect of the middle case. If $p_{k}$ has outgoing links, with probability $d$ the surfer picks a random page linked from $p_{k}$ , and otherwise picks a page at random.
•

$d\approx 0.85$ is the probability someone picks a link on the current page.

This Markov chain converges to a distribution over web pages. Page et al. [1999] reported the search engine had converged to “a reasonable tolerance” for $i=52$ with 322 million links.

Pagerank provides a measure of influence. To get a high Pagerank, a web page should be linked from other pages with a high Pagerank. It is difficult, yet not impossible, to manipulate Pagerank for selfish reasons. One could try to artificially boost Pagerank for a specific page, by creating many pages that point to that page, but it is difficult for those referring pages to also have a high pagerank.

In the initial reported version, Brin and Page [1998] used 24 million web pages and 76 million links. The web is more complex now with many pages being dynamically generated, and search engines use much more sophisticated algorithms.

To determine the probability distribution of state $S_{i}$ , variable elimination can be used to sum out the preceding variables. The variables after $S_{i}$ are irrelevant to the probability of $S_{i}$ and need not be considered. To compute $P(S_{i}\mid S_{k})$ , where $i>k$ , only the variables between $S_{i}$ and $S_{k}$ need to be considered, and if $i<k$ , only the variables less than $k$ need to be considered.

A stationary distribution of a Markov chain is a distribution of the states such that if it holds at one time, it holds at the next time. Thus $P$ is a stationary distribution if for each state $s$ , $P(S_{i+1}{=}s)=P(S_{i}{=}s)$ . Thus,

P(S_{i}{=}s)=\sum_{s_{i}}P(S_{i+1}{=}s\mid S_{i})*P(S_{i})

A Markov chain is ergodic if, for any two states $s_{1}$ and $s_{2}$ in the domain of $S_{i}$ , there is a non-zero probability of eventually reaching $s_{2}$ from $s_{1}$ . A Markov chain is periodic with period $p$ if the difference between the times when it visits the same state is always divisible by $p$ . For example, consider the Markov chain with states the integers 0 to 9, and at each time it either adds 1 or adds 9 (modulo 10), each with probability 0.5. This Markov chain is periodic with period 2; if it starts in an even state at time 0, it will be in an even state at even times, and in an odd state at odd times. If the only period of a Markov chain is a period of 1, then the Markov chain is aperiodic.

If a Markov chain is ergodic and aperiodic, then there is a unique stationary distribution, and this is the equilibrium distribution that will be approached from any starting state. Thus for any distribution over $S_{0}$ , the distribution over $S_{i}$ will get closer and closer to the equilibrium distribution, as $i$ gets larger.

	$\displaystyle P(S_{i+1}$	$\displaystyle=p_{j}\mid S_{i}=p_{k})$
		$\displaystyle=(1-d)/N+d*\left\{\begin{array}[]{ll}1/n_{k}&\mbox{if $p_{k}$ % links to $p_{j}$}\\ 1/N&\mbox{if $p_{k}$ has no links}\\ 0&\mbox{otherwise}\end{array}\right.$

Artificial Intelligence 2E

8.5.1 Markov Chains

Artificial
Intelligence 2E