Artificial Intelligence - foundations of computational agents -- 7.7.1.2 The Bias Involved in Version-Space Learning

Third edition of Artificial Intelligence: foundations of computational agents, Cambridge University Press, 2023 is now available (including the full text).

7.7.1.2 The Bias Involved in Version-Space Learning

Recall that a bias is necessary for any learning to generalize beyond the training data. There must have been a bias in Example 7.27 because, after observing only 5 of the 16 possible assignments to the input variables, an agent was able to make predictions about examples it had not seen.

The bias involved in version-space learning is a called a language bias or a restriction bias because the bias is obtained from restricting the allowable hypotheses. For example, a new example with crime false and music true will be classified as false (the user will not read the article), even though no such example has been seen. The restriction that the hypothesis must be a conjunction of literals is enough to predict its value.

This bias should be contrasted with the bias involved in decision-tree learning. The decision tree can represent any Boolean function. Decision-tree learning involves a preference bias, in that some Boolean functions are preferred over others; those with smaller decision trees are preferred over those with larger decision trees. A decision-tree learning algorithm that builds a single decision tree top-down also involves a search bias in that the decision tree returned depends on the search strategy used.

The candidate elimination algorithm is sometimes said to be an unbiased learning algorithm because the learning algorithm does not impose any bias beyond the language bias involved in choosing H. It is easy for the version space to collapse to the empty set - for example, if the user reads an article with crime false and music true. This means that the target concept is not in H. Version-space learning is not tolerant to noise; just one misclassified example can throw off the whole system.

The bias-free hypothesis space is where H is the set of all Boolean functions. In this case, G always contains one concept: the concept which says that all negative examples have been seen and every other example is positive. Similarly, S contains the single concept which says that all unseen examples are negative. The version space is incapable of concluding anything about examples it has not seen; thus, it cannot generalize. Without a language bias or a preference bias, no generalization and therefore no learning will occur.