Intelligence 2E

foundations of computational agents

The third edition of Artificial Intelligence: foundations of computational agents, Cambridge University Press, 2023 is now available (including full text).

Given some evidence $e$, rejection sampling estimates $P(h\mid e)$ using the formula

$$P(h\mid e)=\frac{P(h\wedge e)}{P(e)}.$$ |

This is computed by considering only the samples where $e$ is true and by determining the proportion of these in which $h$ is true. The idea of rejection sampling is that samples are generated as before, but any sample where $e$ is false is rejected. The proportion of the remaining, non-rejected, samples where $h$ is true is an estimate of $P(h\mid e)$. If the evidence is a conjunction of assignments of values to variables, a sample is rejected when any variable is assigned a value different from its observed value.

Sample | Tampering | Fire | Alarm | Smoke | Leaving | Report | |
---|---|---|---|---|---|---|---|

${s}_{1}$ | $false$ | $false$ | $true$ | $false$ | ✘ | ||

${s}_{2}$ | $false$ | $true$ | $false$ | $true$ | $false$ | $false$ | ✔ |

${s}_{3}$ | $false$ | $true$ | $true$ | $false$ | ✘ | ||

${s}_{4}$ | $false$ | $true$ | $false$ | $true$ | $false$ | $false$ | ✔ |

${s}_{5}$ | $false$ | $true$ | $true$ | $true$ | $true$ | $true$ | ✘ |

${s}_{6}$ | $false$ | $false$ | $false$ | $true$ | $false$ | $false$ | ✔ |

${s}_{7}$ | $true$ | $false$ | $false$ | $false$ | ✘ | ||

${s}_{8}$ | $true$ | $true$ | $true$ | $true$ | $true$ | $true$ | ✘ |

… | |||||||

${s}_{1000}$ | $true$ | $false$ | $true$ | $false$ | ✘ |

Figure 8.29 shows how rejection sampling is used to estimate ${P}{\mathit{}}{\mathrm{(}}{t}{\mathit{}}{a}{\mathit{}}{m}{\mathit{}}{p}{\mathit{}}{e}{\mathit{}}{r}{\mathit{}}{i}{\mathit{}}{n}{\mathit{}}{g}{\mathrm{\mid}}{s}{\mathit{}}{m}{\mathit{}}{o}{\mathit{}}{k}{\mathit{}}{e}{\mathrm{\wedge}}{\mathit{}}{\mathrm{\neg}}{\mathit{}}{r}{\mathit{}}{e}{\mathit{}}{p}{\mathit{}}{o}{\mathit{}}{r}{\mathit{}}{t}{\mathrm{)}}$. Any sample with ${S}{\mathit{}}{m}{\mathit{}}{o}{\mathit{}}{k}{\mathit{}}{e}{\mathrm{=}}{f}{\mathit{}}{a}{\mathit{}}{l}{\mathit{}}{s}{\mathit{}}{e}$ is rejected. The sample is rejected without considering any more variables. Any sample with ${R}{\mathit{}}{e}{\mathit{}}{p}{\mathit{}}{o}{\mathit{}}{r}{\mathit{}}{t}{\mathrm{=}}{t}{\mathit{}}{r}{\mathit{}}{u}{\mathit{}}{e}$ is rejected. The sample average from the remaining samples (those marked with ✔) is used to estimate the posterior probability of ${t}{\mathit{}}{a}{\mathit{}}{m}{\mathit{}}{p}{\mathit{}}{e}{\mathit{}}{r}{\mathit{}}{i}{\mathit{}}{n}{\mathit{}}{g}$.

Because ${P}{\mathit{}}{\mathrm{(}}{s}{\mathit{}}{m}{\mathit{}}{o}{\mathit{}}{k}{\mathit{}}{e}{\mathrm{\wedge}}{\mathit{}}{\mathrm{\neg}}{\mathit{}}{r}{\mathit{}}{e}{\mathit{}}{p}{\mathit{}}{o}{\mathit{}}{r}{\mathit{}}{t}{\mathrm{)}}{\mathrm{=}}{\mathrm{0.0128}}$, we would expect about 13 samples out of the 1000 to have ${s}{\mathit{}}{m}{\mathit{}}{o}{\mathit{}}{k}{\mathit{}}{e}{\mathrm{\wedge}}{\mathit{}}{\mathrm{\neg}}{\mathit{}}{r}{\mathit{}}{e}{\mathit{}}{p}{\mathit{}}{o}{\mathit{}}{r}{\mathit{}}{t}$ true; the other 987 samples would have ${s}{\mathit{}}{m}{\mathit{}}{o}{\mathit{}}{k}{\mathit{}}{e}{\mathrm{\wedge}}{\mathit{}}{\mathrm{\neg}}{\mathit{}}{r}{\mathit{}}{e}{\mathit{}}{p}{\mathit{}}{o}{\mathit{}}{r}{\mathit{}}{t}$ false, and so would be rejected. Thus, 13 is used as ${n}$ in Hoeffding’s inequality, which, for example, guarantees an error for any probability computed from these samples of less than ${\mathrm{0.25}}$ in about 86% of the cases, which is not very accurate.

The error in the probability of $h$ depends on the number of samples that are not rejected, which proportional to $P(e)$. Hoeffding’s inequality can be used to estimate the error of rejection sampling, where $n$ is the number of non-rejected samples. Therefore, the error depends on $P(e)$.

Rejection sampling does not work well when the evidence is unlikely. This may not seem like that much of a problem because, by definition, unlikely evidence is unlikely to occur. But, although this may be true for simple models, for complicated models with complex observations, every possible observation may be unlikely. Also, for many applications, such as in diagnosis, the user is interested in determining the probabilities because unusual observations are involved.