Similar to information again, alternate measures can be used to measure impurity of node and thus play role in selection of an attribute to split node to sub nodes (branches) or leaves.
GINI: Similar to Information Gain, GINI measures impurity of a node. GINI is an alternative and can be used in place of Information Gain. Example CART (Classification Tree) uses GINI index for splitting decision tree nodes.
Node (N) with “s” total data elements has subset (count) data elements of class “i”, then
Similar to entropy, GINI impurity index values range from 0 to .5. Graph plotted with values of GINI index is below
While entropy values range between 0 to 1 , GINI index values range between 0 to 0.5. Additionally GINI includes number of classes (and count), it may not need to compute something like “Gain Ratio” as in case of Information Gain.
See below comparison of Entropy and GINI values with different splits of Hired and Not Hired in 10 total candidates.
|Hired||Not Hired||Total (Hired / Not Hired)||Entropy||GINI|
Similar to computing “Information Gain” for split operation using attribute “A” at node, using GINI impurity is computed at parent node. Selection of an attribute to split node is based on reduction in impurity. If a node (N) with “t” total elements is split into multiple “k” sub nodes, with each node containing “t(i)” elements, aggregated GINI impurity is
Similar to Decision Trees with “Information Gain” using GINI impurity index, attributes that result in split of parent node that larger nodes(more number values) with higher purity are preferred.
Next is measuring impurity using “Misclassification Error”.