In earlier posts we discussed how an attribute is selected based on Information Gain (Entropy) , GINI Index. Similar to those computations “Misclassification Error” is another method to select optimal attribute to split and build decision trees.
Misclassification errors range between 0 (minimum) and 0.5 (maximum)
Below is table where nodes are split as “Hired” and “Not Hired” and how Entropy / GINI impurity index and Miscalculation error computed values.
- To select an appropriate attribute for splitting, decision trees uses method “impurity reduction” method.
- Impurity of nodes can be computed using
- Information Gain
- GINI Impurity Index
- Misclassification Error
- Impurity reduction can be computed as difference between “Impurity as Node before Split” and “Aggregated impurities of all child nodes”.
- Information Gain method is biased towards categorical attributes that has many distinct values (singleton splits with 100% purity). To avoid this, enhanced measurement called “Gain Ratio” is used.
Next post is data types and impact of data types on Decision Trees..