foundations of computational agents
Likelihood weighting is an instance of importance sampling. Importance sampling algorithms have the following characteristics:
Samples are weighted.
The samples do not need to come from the actual distribution, but can be from (almost) any distribution, with the weights adjusted to reflect the difference between the distributions.
Some variables can be summed out and some sampled.
This freedom to sample from a different distribution allows the algorithm to choose better sampling distributions to give better estimates.
Stochastic simulation can be used to compute the expected value of real-valued variable under probability distribution using:
where is a sample that is sampled with probability , and is the number of samples. The estimate gets more accurate as the number of samples grows.
Suppose it is difficult to sample with the distribution , but it is easy to sample from a distribution . We adapt the previous equation to estimate the expected value from , by sampling from using:
where the last sum is over samples selected according the distribution . The distribution is called a proposal distribution. The only restriction on is that it should not be zero for any cases where is not zero (i.e., if then ).
Recall that for Boolean variables, with represented as 1 and as 0, the expected value is the probability. So the methods here can be used to compute probabilities.
In the running alarm example, . As explained in Example 8.42, if the algorithm samples according to the prior probability, would only be true in about 19 samples out of 1000. Likelihood weighting ended up with a few samples with high weights and many samples with low weights, even though the samples represented similar number of cases.
Suppose, instead of sampling according to the probability, the proposal distribution with is used. Then is sampled 50% of the time. According to the model , thus each sample with is weighted by and each sample with is weighted by .
With importance sampling with as the proposal distribution, half of the samples will have , and the model specifies . Given the evidence , these will be weighted by . The other half of the samples will have , and the model specifies . These samples will have a weighting of . Notice how all of the samples have weights of the same order of magnitude. This means that the estimates from these are much more accurate.
Importance sampling can also be combined with exact inference. Not all variables need to be sampled. Those not sampled can be summed out by variable elimination.
The best proposal distribution is one where the samples have approximately equal weight. This occurs when sampling from the posterior distribution. In adaptive importance sampling the proposal distribution is modified to approximate the posterior probability of the variable being sampled.