7.6.1 Random Forests

One simple yet effective composite model is to have an averaging over decision trees, known as a random forest. The idea is to have a number of decision trees, each of which can make a prediction on each example, and to aggregate the predictions of the trees to make a prediction of the forest for each example.

In order to make this work effectively, the trees that make up the forest need to make diverse predictions. There are a number of ways that we can ensure diversity:

•

A subset of the features could be used for each tree. Rather than using all of the features, a random subset of, say one third of the features could be used for each tree.
•

Instead of splitting on the best feature, the tree could choose the best from a smaller set of candidate features at each split. The set of features to choose from could change for each tree or even for each node.
•

Each tree could use a different subset of the examples to train on. Suppose there are $m$ training examples. If there are many examples, each tree could use just a few of them. In bagging a random subset (with replacement) of $m$ examples is selected for each tree to train on. In each of these sets, some examples are not chosen, and some are duplicated. On average, each set contains about 63% of the original examples.

Once the trees have been trained, a prediction can use the average of the predictions of the tree for a probabilistic prediction. Alternatively, each tree can vote with its most likely classification, and the prediction with the most votes can be used.

Artificial Intelligence 2E

7.6.1 Random Forests

Artificial
Intelligence 2E