Third edition of Artificial Intelligence: foundations of computational agents, Cambridge University Press, 2023 is now available (including the full text).

11.1.1 Expectation Maximization

The expectation maximization (EM) algorithms can be used for clustering. Given the data, EM learns a theory that specifies how each example should be classified and how to predict feature values for each class. The general idea is to start with a random theory or randomly classified data and to repeat two steps until they converge on a coherent theory:

E:
Classify the data using the current theory.
M:
Generate the best theory using the current classification of the data.

Step E generates the expected classification for each example. Step M generates the most likely theory given the classified data. The M step is the problem of supervised learning. As an iterative algorithm, it can get stuck in local optima; different initializations can affect the theory found.

The next two sections consider two instances of the EM algorithm.