Intelligence 2E

foundations of computational agents

The third edition of Artificial Intelligence: foundations of computational agents, Cambridge University Press, 2023 is now available (including full text).

The simplest case is to generate the probability distribution of a single variable. This is the base case the other methods build on.

To generate samples from a single discrete or real-valued variable, $X$, first totally order the values in the domain of $X$. For discrete variables, if there is no natural order, just create an arbitrary ordering. Given this ordering, the cumulative probability distribution is a function of $x$, defined by $f(x)=P(X\le x)$.

To generate a random sample for $X$, select a random number $y$ in the domain $[0,1]$. We select $y$ from a uniform distribution to ensure that each number between 0 and 1 has the same chance of being chosen. Let $v$ be the value of $X$ that maps to $y$ in the cumulative probability distribution. That is, $v$ is the element of $domain(X)$ such that $f(v)=y$ or, equivalently, $v={f}^{-1}(y)$. Then, $X=v$ is a random sample of $X$, chosen according to the distribution of $X$.

Consider a random variable ${X}$ with domain ${\mathrm{\{}}{{v}}_{{\mathrm{1}}}{\mathrm{,}}{{v}}_{{\mathrm{2}}}{\mathrm{,}}{{v}}_{{\mathrm{3}}}{\mathrm{,}}{{v}}_{{\mathrm{4}}}{\mathrm{\}}}$. Suppose ${P}{\mathrm{(}}{X}{\mathrm{=}}{{v}}_{{\mathrm{1}}}{\mathrm{)}}{\mathrm{=}}{\mathrm{0.3}}$, ${P}{\mathrm{(}}{X}{\mathrm{=}}{{v}}_{{\mathrm{2}}}{\mathrm{)}}{\mathrm{=}}{\mathrm{0.4}}$, ${P}{\mathrm{(}}{X}{\mathrm{=}}{{v}}_{{\mathrm{3}}}{\mathrm{)}}{\mathrm{=}}{\mathrm{0.1}}$, and ${P}{\mathrm{(}}{X}{\mathrm{=}}{{v}}_{{\mathrm{4}}}{\mathrm{)}}{\mathrm{=}}{\mathrm{0.2}}$. First, totally order the values, say $$. Figure 8.27 shows ${P}{\mathit{}}{\mathrm{(}}{X}{\mathrm{)}}$, the distribution for ${X}$, and ${f}{\mathit{}}{\mathrm{(}}{X}{\mathrm{)}}$, the cumulative distribution for ${X}$. Consider value ${{v}}_{{\mathrm{1}}}$; 0.3 of the domain of ${f}$ maps back to ${{v}}_{{\mathrm{1}}}$. Thus, if a sample is uniformly selected from the ${Y}$-axis, ${{v}}_{{\mathrm{1}}}$ has a 0.3 chance of being selected, ${{v}}_{{\mathrm{2}}}$ has a ${\mathrm{0.4}}$ chance of being selected, and so forth.

Probabilities can be estimated from a set of samples using the sample average. The sample average of a proposition $\alpha $ is the number of samples where $\alpha $ is true divided by the total number of samples. The sample average approaches the true probability as the number of samples approaches infinity by the law of large numbers.

Hoeffding’s inequality provides an estimate of the error of the sample average as the probability given $n$ samples, with the guarantee of the following proposition.

Suppose $p$ is the true probability, and $s$ is the sample average from $n$ independent samples; then

$$P(|s-p|>\u03f5)\le 2{e}^{-2n{\u03f5}^{2}}.$$ |

This theorem can be used to determine how many samples are required to guarantee a probably approximately correct estimate of the probability. To guarantee that the error is always less than some $$, infinitely many samples are required. However, if you are willing to have an error greater than $\u03f5$ in $\delta $ of the cases, solve $$ for $n$, which gives

$$n>\frac{-\mathrm{ln}\frac{\delta}{2}}{2{\u03f5}^{2}}.$$ |

For example, suppose you want an error less than $0.1$, nineteen times out of twenty; that is, you are only willing to tolerate an error bigger than 0.1 in 5% of the cases. You can use Hoeffding’s bound by setting $\u03f5$ to $0.1$ and $\delta =0.05$, which gives $n>184$. Thus, you guarantee such bounds on the error with 185 samples. If you want an error of less than $0.01$ in at least $95\%$ of the cases, 18,445 samples could be used. If you want an error of less than $0.1$ in $99\%$ of the cases, $265$ samples could be used.