Intelligence 2E

foundations of computational agents

The simplest case occurs when a learning agent is given the structure of the model and all the variables have been observed. The agent must learn the conditional probabilities, $P({X}_{i}\mid parents({X}_{i}))$ for each variable ${X}_{i}$. Learning the conditional probabilities is an instance of supervised learning, where ${X}_{i}$ is the target feature, and the parents of ${X}_{i}$ are the input features.

For cases with few parents, each conditional probability can be learned separately using the training examples and prior knowledge, such as pseudocounts.

Model | Data | ➪ | Probabilities |
---|---|---|---|

$\begin{array}{ccccc}A\hfill & B\hfill & C\hfill & D\hfill & E\hfill \\ t\hfill & f\hfill & t\hfill & t\hfill & f\hfill \\ f\hfill & t\hfill & t\hfill & t\hfill & t\hfill \\ t\hfill & t\hfill & f\hfill & t\hfill & f\hfill \\ \multicolumn{5}{c}{\mathrm{\cdots}}\end{array}$ | $\begin{array}{c}P(A)\hfill \\ P(B)\hfill \\ P(E\mid A,B)\hfill \\ P(C\mid E)\hfill \\ P(D\mid E)\hfill \end{array}$ |

Figure 10.7 shows a typical example. We are given the model and the data, and we must infer the probabilities.

For example, one of the elements of ${P}{\mathit{}}{\mathrm{(}}{E}{\mathrm{\mid}}{A}{\mathit{}}{B}{\mathrm{)}}$ is

${P}{(}{E}{=}{t}{\mid}{A}{=}{t}{\wedge}{B}{=}{f}{)}{=}{\displaystyle \frac{{{n}}_{{1}}{+}{{c}}_{{1}}}{{{n}}_{{0}}{+}{{n}}_{{1}}{+}{{c}}_{{0}}{+}{{c}}_{{1}}}}$ |

where ${{n}}_{{\mathrm{1}}}$ is the number of cases where ${E}{\mathrm{=}}{t}{\mathrm{\wedge}}{A}{\mathrm{=}}{t}{\mathrm{\wedge}}{B}{\mathrm{=}}{f}$, and ${{c}}_{{\mathrm{1}}}{\mathrm{\ge}}{\mathrm{0}}$ is the corresponding pseudocount that is provided before any data is observed. Similarly, ${{n}}_{{\mathrm{0}}}$ is the number of cases where ${E}{\mathrm{=}}{f}{\mathrm{\wedge}}{A}{\mathrm{=}}{t}{\mathrm{\wedge}}{B}{\mathrm{=}}{f}$, and ${{c}}_{{\mathrm{0}}}{\mathrm{\ge}}{\mathrm{0}}$ is the corresponding pseudocount.

If a variable has many parents, using counts and pseudocounts can suffer from overfitting. Overfitting is most severe when there are few examples for some of the combinations of the parent variables. In that case, the supervised learning techniques of Chapter 7 could be used. Decision trees can be used for arbitrary discrete variables. Logistic regression and neural networks can represent a conditional probability of a binary variable given its parents. For non-binary discrete variables, indicator variables may be used.