foundations of computational agents
The most common and useful probabilistic inference task is to compute the posterior distribution of a query variable or variables given some evidence. Unfortunately, even the problem of estimating the posterior probability in a belief network within an absolute error (of less than $0.5$), or within a constant multiplicative factor, is NP-hard, so general efficient implementations will not be available. Computing the posterior probability of a variable is in a complexity class called $\mathrm{\#}NP$ (pronounced “sharp-NP”). $NP$ is the complexity of determining whether there is a solution to a decision problem where solutions can be verified in polynomial time. $\mathrm{\#}NP$ is the complexity of counting the number of solutions. These are for the worst case, however, there is often structure that can be exploited, such as conditional independence.
The main approaches for probabilistic inference in belief networks are:
where the probabilities are computed exactly. A naive way is to enumerate the worlds that are consistent with the evidence, but this algorithm takes time exponential in the number of variables. It is possible to do much better than this by exploiting the structure of the network. The recursive conditioning and variable elimination algorithms (below) are exact algorithms that exploit conditional independence so that they can be much more efficient for networks that are not highly interconnected.
where probabilities are approximated. Such methods are characterized by the guarantees they provide:
They could produce guaranteed bounds on the probabilities. That is, they return a range $[l,u]$ where the exact probability $p$ is guaranteed to have $l\le p\le u$. An anytime algorithm may guarantee that $l$ and $u$ get closer to each other as computation time (and perhaps space) increases.
They could produce probabilistic bounds on the error. Such algorithms might guarantee that the error, for example, is within 0.1 of the correct answer 95% of the time. They might also have guarantees that, as time increases, probability estimates will converge to the exact probability. Some even have guarantees of the rates of convergence. Stochastic simulation is a class of algorithms, many of which have such guarantees.
They could make a best effort to produce an approximation that may be good enough, even though there may be cases where they do not work very well. One such class of techniques is called variational inference, where the idea is to find an approximation to the problem that is easy to compute. First choose a class of representations that are easy to compute. This class could be as simple as the set of disconnected belief networks (with no arcs). Next try to find a member of the class that is closest to the original problem. That is, find an easy-to-compute distribution that is as close as possible to the posterior distribution to be computed. Thus, the problem reduces to an optimization problem of minimizing the error, followed by a simple inference problem.
This book presents some exact methods and some stochastic simulation methods.