Intelligence 2E

foundations of computational agents

The third edition of Artificial Intelligence: foundations of computational agents, Cambridge University Press, 2023 is now available (including full text).

Importance sampling enumerates the samples one at a time and, for each sample, assigns a value to each variable. It is also possible to start with all of the samples and, for each variable, generate a value for that variable for each sample. For example, for the data of Figure 8.28, the same data could be generated by generating all of the samples for $Tampering$ before generating the samples for $Fire$. The particle filtering algorithm or sequential Monte Carlo generates all the samples for one variable before moving to the next variable. It does one sweep through the variables, and for each variable it does a sweep through all of the samples. This algorithm is advantageous when variables are generated dynamically and when there are unboundedly many variables, as in the sequential models. It also allows for a new operation of resampling.

In particle filtering, the samples are called particles. A particle is a $variable$-$value$ dictionary, where a dictionary is a representation of a partial function from keys into values; here the key is a variable and the particle maps to its value. A particle has an associated weight. A set of particles is a population.

The algorithm starts with a population of $n$ empty dictionaries. The algorithm repeatedly selects a variable according to an ordering where a variable is selected after its parents. If the variable is not observed, for each particle, a value for the variable for that particle is sampled from the distribution of the variable given the assignment of the particle. If the variable is observed, each particle’s weight is updated by multiplying by the probability of the observation given the assignment of the particle.

Given a population of $n$ particles, resampling generates a new population of $n$ particles, each with the same weight. Each particle is selected with probability proportional to its weight. Resampling can be implemented in the same way that random samples for a single random variable are generated, but particles, rather than values, are selected. Some particles may be selected multiple times and others might not be selected at all.

The particle filtering algorithm is shown in Figure 8.31. Line 16 assigns $X$ its observed value. Line 17, which is used when $X$ observed, updates the weights of the particles according to the probability of the observation on $X$. Line 22 assigns $X$ a value sampled from the distribution of $X$ given the values of its parents in the particle.

This algorithm resamples after each observation. It is also possible to resample less often, for example, after a number of variables are observed.

Importance sampling is equivalent to particle filtering without resampling. The principal difference is the order in which the particles are generated. In particle filtering, each variable is sampled for all particles, whereas, in importance sampling, each particle (sample) is sampled for all variables before the next particle is considered.

Particle filtering has two main advantages over importance sampling. First, it can be used for an unbounded number of variables, as in hidden Markov models and dynamic belief networks. Second, resampling enables the particles to better cover the distribution over the variables. Whereas importance sampling will result in some particles that have very low probability, with only a few of the particles covering most of the probability mass, resampling lets many particles more uniformly cover the probability mass.

Consider using particle filtering to compute ${P}{\mathit{}}{\mathrm{(}}{t}{\mathit{}}{a}{\mathit{}}{m}{\mathit{}}{p}{\mathit{}}{e}{\mathit{}}{r}{\mathit{}}{i}{\mathit{}}{n}{\mathit{}}{g}{\mathrm{\mid}}{s}{\mathit{}}{m}{\mathit{}}{o}{\mathit{}}{k}{\mathit{}}{e}{\mathrm{\wedge}}{\mathit{}}{r}{\mathit{}}{e}{\mathit{}}{p}{\mathit{}}{o}{\mathit{}}{r}{\mathit{}}{t}{\mathrm{)}}$ for the belief network of Figure 8.3. First generate the particles ${{s}}_{{\mathrm{1}}}{\mathrm{,}}{\mathrm{\dots}}{\mathrm{,}}{{s}}_{{\mathrm{1000}}}$. Suppose it first samples ${F}{\mathit{}}{i}{\mathit{}}{r}{\mathit{}}{e}$. Out of the 1000 particles, about 10 will have ${F}{\mathit{}}{i}{\mathit{}}{r}{\mathit{}}{e}{\mathrm{=}}{t}{\mathit{}}{r}{\mathit{}}{u}{\mathit{}}{e}$ and about 990 will have ${F}{\mathit{}}{i}{\mathit{}}{r}{\mathit{}}{e}{\mathrm{=}}{f}{\mathit{}}{a}{\mathit{}}{l}{\mathit{}}{s}{\mathit{}}{e}$ (as ${P}{\mathit{}}{\mathrm{(}}{f}{\mathit{}}{i}{\mathit{}}{r}{\mathit{}}{e}{\mathrm{)}}{\mathrm{=}}{\mathrm{0.01}}$). It then absorbs the evidence ${S}{\mathit{}}{m}{\mathit{}}{o}{\mathit{}}{k}{\mathit{}}{e}{\mathrm{=}}{t}{\mathit{}}{r}{\mathit{}}{u}{\mathit{}}{e}$. Those particles with ${F}{\mathit{}}{i}{\mathit{}}{r}{\mathit{}}{e}{\mathrm{=}}{t}{\mathit{}}{r}{\mathit{}}{u}{\mathit{}}{e}$ will be weighted by 0.9 as ${P}{\mathit{}}{\mathrm{(}}{s}{\mathit{}}{m}{\mathit{}}{o}{\mathit{}}{k}{\mathit{}}{e}{\mathrm{\mid}}{f}{\mathit{}}{i}{\mathit{}}{r}{\mathit{}}{e}{\mathrm{)}}{\mathrm{=}}{\mathrm{0.9}}$ and those particles with ${F}{\mathit{}}{i}{\mathit{}}{r}{\mathit{}}{e}{\mathrm{=}}{f}{\mathit{}}{a}{\mathit{}}{l}{\mathit{}}{s}{\mathit{}}{e}$ will be weighted by 0.01 as ${P}{\mathit{}}{\mathrm{(}}{s}{\mathit{}}{m}{\mathit{}}{o}{\mathit{}}{k}{\mathit{}}{e}{\mathrm{\mid}}{\mathrm{\neg}}{\mathit{}}{f}{\mathit{}}{i}{\mathit{}}{r}{\mathit{}}{e}{\mathrm{)}}{\mathrm{=}}{\mathrm{0.01}}$. It then resamples; each particle is chosen in proportion to its weight. The particles with ${F}{\mathit{}}{i}{\mathit{}}{r}{\mathit{}}{e}{\mathrm{=}}{t}{\mathit{}}{r}{\mathit{}}{u}{\mathit{}}{e}$ will be chosen in the ratio ${\mathrm{990}}{\mathrm{*}}{\mathrm{0.01}}{\mathrm{:}}{\mathrm{10}}{\mathrm{*}}{\mathrm{0.9}}$. Thus, about 524 particles will be chosen with ${F}{\mathit{}}{i}{\mathit{}}{r}{\mathit{}}{e}{\mathrm{=}}{t}{\mathit{}}{r}{\mathit{}}{u}{\mathit{}}{e}$, and the remainder with ${F}{\mathit{}}{i}{\mathit{}}{r}{\mathit{}}{e}{\mathrm{=}}{f}{\mathit{}}{a}{\mathit{}}{l}{\mathit{}}{s}{\mathit{}}{e}$. The other variables are sampled, in turn, until ${R}{\mathit{}}{e}{\mathit{}}{p}{\mathit{}}{o}{\mathit{}}{r}{\mathit{}}{t}$ is observed, and the particles are resampled. At this stage, the probability of ${T}{\mathit{}}{a}{\mathit{}}{m}{\mathit{}}{p}{\mathit{}}{e}{\mathit{}}{r}{\mathit{}}{i}{\mathit{}}{n}{\mathit{}}{g}{\mathrm{=}}{t}{\mathit{}}{r}{\mathit{}}{u}{\mathit{}}{e}$ will be the proportion of the samples with tampering being true.

Note that in particle filtering the particles are not independent, so Hoeffding’s inequality is not directly applicable.