Candidate Elimination Algorithm

The candidate elimination algorithm incrementally builds the version space given a hypothesis space H and a set E of examples. The examples are added one by one; each example possibly shrinks the version space by removing the hypotheses that are inconsistent with the example. The candidate elimination algorithm does this by updating the general and specific boundary for each new example. This is described in Figure 7.14.

1: Procedure CandidateEliminationLearner(X,Y,E,H)
2:           Inputs
3:                     X: set of input features, X={X1,...,Xn}
4:                     Y: target feature
5:                     E: set of examples from which to learn
6:                     H: hypothesis space
7:           Output
8:                     general boundary G⊆H
9:                     specific boundary S⊆H consistent with E
10:           Local
11:                     G: set of hypotheses in H
12:                     S: set of hypotheses in H
13:           Let G={true}, S={false};
14:           for each e∈E do
15:                     if ( e is a positive example) then
16:                               Elements of G that classify e as negative are removed from G;
17:                               Each element s of S that classifies e as negative is removed and replaced by the minimal generalizations of s that classify e as positive and are less general than some member of G;
18:                               Non-maximal hypotheses are removed from S;
19:                     else
20:                               Elements of S that classify e as positive are removed from S;
21:                               Each element g of G that classifies e as positive is removed and replaced by the minimal specializations of g that classifies e as negative and are more general than some member of S.
22:                               Non-minimal hypotheses are removed from G.
Figure 7.14: Candidate elimination algorithm

Example 7.27: Consider how the candidate elimination algorithm handles Example 7.23, where H is the set of conjunctions of literals.

Before it has seen any examples, G0={true} - the user reads everything - and S0={false} - the user reads nothing. Note that true is the empty conjunction and false is the conjunction of an atom and its negation.

After considering the first example, a1, G1={true} and

S1={crime ∧¬academic ∧¬local ∧music}.

Thus, the most general hypothesis is that the user reads everything, and the most specific hypothesis is that the user only reads articles exactly like this one.

After considering the first two examples, G2={true} and

S2={crime ∧¬academic ∧¬local}.

Since a1 and a2 disagree on music, it has concluded that music cannot be relevant.

After considering the first three examples, the general boundary becomes

G3={crime , ¬academic }

and S3=S2. Now there are two most general hypotheses; the first is that the user reads anything about crime, and the second is that the user reads anything non-academic.

After considering the first four examples,

G4={crime , ¬academic ∧¬local }

and S4=S3.

After considering all five examples, we have

G5={crime },
S5={crime ∧¬local}.

Thus, after five examples, only two hypotheses exist in the version space. They differ only on their prediction on an example that has crime ∧local true. If the target concept can be represented as a conjunction, only an example with crime ∧local true will change G or S. This version space can make predictions about all other examples.