Third edition of Artificial Intelligence: foundations of computational agents, Cambridge University Press, 2023 is now available (including the full text).

6.4.2.6 Particle Filtering

Importance sampling enumerates the samples one at a time and, for each sample, assigns a value to each variable. It is also possible to start with all of the samples and, for each variable, generate a value for that variable for each of the samples. For example, for the data of Figure 6.10, the same data could be generated by generating all of the values for Tampering before generating the values for Fire. The particle filtering algorithm generates all the samples for one variable before moving to the next variable. It does one sweep through the variables, and for each variable it does a sweep through all of the samples. This algorithm has an advantage when variables are generated dynamically and there can be unboundedly many variables. It also allows for a new operation of resampling.

Given a set of samples on some of the variables, resampling consists of taking n samples, each with their own weighting, and generating a new set of n samples, each with the same weight. Resampling can be implemented in the same way that random samples for a single random variable are generated, but samples, rather than values, are selected. Some of the samples are selected multiple times and some are not selected.

A particle consists of an assignment of a value to a set of variables and an associated weight. The probability of a proposition, given some evidence, is proportional to the weighted proportion of the weights of the particles in which the proposition is true. A set of particles is a population.

Particle filtering is a sampling method that starts with a population of particles, each of which assigns a value to no variables, and has a weight of 1. At each step it can

  • select a variable that has not been sampled or summed out and is not observed. For each particle, it samples the variable according to some proposal distribution. The weight of the particle is updated as in importance sampling.
  • select a piece of evidence to absorb. This evidence should not have been absorbed before. The weight of the particle is multiplied by the probability of the evidence given the values of the particle (and summing out any variables that are relevant and have not been sampled).
  • resample the population. Resampling constructs a new population of particles, each with the same weight, by selecting particles from the population, where each particle is chosen with probability proportional to the weight of the particle. Some particles may be forgotten and some may be duplicated.

Importance sampling can be seen as being equivalent to particle filtering without resampling, but the principal difference is the order in which the particles are generated. In particle filtering, each variable is sampled for all particles, whereas, in importance sampling, each particle (sample) is sampled for all variables before the next particle is considered.

Particle filtering has two main advantages over importance sampling. First, it can be used for an unbounded number of variables (which we will see later). Second, the particles better cover the hypothesis space. Whereas importance sampling will involve some particles that have very low probability, with only a few of the particles covering most of the probability mass, resampling lets many particles more uniformly cover the probability mass.

Example 6.27: Consider using particle filtering to compute P(Report|smoke) for the belief network of Figure 6.1. First generate the particles s1,...,s1000. For this example, we use the conditional probability of the variable being sampled given the particle as the proposal distribution. Suppose it first samples Fire. Out of the 1,000 particles, about 10 will have Fire=true and about 990 will have Fire=false (as P(fire)=0.01). It can then absorb the evidence Smoke=true. Those particles with Fire=true will be weighted by 0.9 [as P(smoke | fire ) = 0.9] and those particles with Fire=false will be weighted by 0.01 [as P(smoke | ¬fire ) = 0.01]. It can then resample; each particle can be chosen in proportion to its weight. The particles with Fire=true will be chosen in the ratio 990×0.01 : 10×0.9. Thus, about 524 particles will be chosen with Fire=true, and the remainder with Fire=false. The other variables can be sampled, in turn, until Report is sampled.

Note that in particle filtering the particles are not independent, so Hoeffding's inequality, is not directly applicable.