Decision Trees – Part 4

Decision tree is a recursive algorithm, implying it splits until

• All Subsets are pure (entropy = 0 or homogenous)
• Entropy is not equal to 0 but there are no more attributes to split
• Entropy is not equal to 0, but predefined tree size is set and tree size has reached to set limit.

Decision Trees keep splitting data until every subset is homogenous and pure, if there are enough attributes available. That implies, decision trees will split data to such an extent that to make subsets pure, it may end up with a singleton leaves. And each leaf of decision tree contains only 1 value.

Say for example there is a column (like identity) that uniquely identifies rows. Using such column for splitting, decision trees algorithm will classify each value in a single leaf (thus making it homogenous but at same not useful).

If a decision tree, classifies data to that detail level (like using identity or unique identifier or transaction dates), such a classifier is not useful, as any new value will be different from an existing values and decision tree classifier would fail to classify such data. Such iterative splitting leading to perfect homogenous leaves would lead to over-fitting of algorithm to sample data.

If Size of decision tree is equal to number of examples in data set, then clearly decision tress is over fitted to training data.

During training, with every split a new layer of branches and / or leaves are formed and thus increasing size of decision trees. Increase in size of decision tree increases its accuracy on training data but at same decreases its effectiveness on test data (due to over fitting).

If a decision tree is over fitted with data, it is made very specific to training data and it is not generalized and thus not useful for test data as well as data other than training. To avoid over fitting (or over growing), decision trees are pruned. The main purposes of pruning are

• Reduce decision tree classifier complexity and thus increasing prediction accuracy (removal of over-fitting of decision tree algorithm to training data)
• Removal of portions of tree (branches / roots) that is based on erroneous data or noisy data thus again not useful for prediction (as classifier is not generalized).

There are many approaches to decision tree pruning and math behind pruning. Each of these pruning techniques follow either top down (root to leaf) or bottoms up (leaf to root) approaches. Additionally pruning can be done while building or post building decision trees.

Considering volume and variety of decision trees pruning techniques, there will be separate post for pruning.

Summary: Decision trees have tendency to grow till all leaves of tree are homogenous and thus may end up with singleton leaf nodes. If decision trees predict with 100% accuracy training data, there may be case where classifier has over-fitting problem and is not generalized for subsequent use either with test or real business data. To avoid over-fitting decision trees are pruned (removing branches and / or leaf nodes) without reducing prediction accuracy.

Guru