### 11.1.1 Expectation Maximization

The **expectation maximization (EM)** algorithms can
be used for clustering. Given the data, EM learns a theory that
specifies how
each example should be classified and how to predict feature values for
each class. The general idea is to start with a random theory or randomly
classified data and to repeat two steps until they converge on a
coherent theory:

- E:
- Classify the data using the current theory.
- M:
- Generate the best theory using the current classification of the data.

Step *E* generates the expected classification for each example. Step *M* generates the most likely theory given the classified data. The *M*
step is the problem of supervised learning. As an iterative algorithm,
it can get stuck in local optima; different initializations
can affect the theory found.

The next two sections consider two instances of the EM algorithm.