Third edition of Artificial Intelligence: foundations of computational agents, Cambridge University Press, 2023 is now available (including the full text).

6.4.2.3 From Samples to Probabilities

Probabilities can be estimated from a set of examples using the sample average. The sample average of a proposition α is the number of samples where α is true divided by the total number of samples. The sample average approaches the true probability as the number of samples approaches infinity by the law of large numbers.

Hoeffding's inequality provides an estimate of the error of an unconditional probability given n samples:

Proposition. [Hoeffding] Suppose p is the true probability, and s is the sample average from n independent samples; then
P(|s-p|>ε) ≤ 2e-2nε2.

This theorem can be used to determine how many samples are required to guarantee a probably approximately correct estimate of the probability. To guarantee that the error is less than some ε<0.5, infinitely many samples are required. However, if you are willing to have an error greater than ε in δ of the cases, you can solve 2e-2nε2 for n, which gives

n>-ln(δ/2)/(2ε2).

For example, suppose you want an error less than 0.1, nineteen times out of twenty; that is, you are only willing to tolerate an error bigger than 0.1, in 5% of the cases. You can use Hoeffding's bound by setting ε to 0.1 and δ=0.05, which gives n> 184. Thus, you can guarantee such bounds on the error with 185 samples. If you want an error of less than 0.01 in at least 95% of the cases, 18,445 samples can be used. If you want an error of less than 0.1 in 99% of the cases, 265 samples can be used.