How to select Best Split in Decision Trees using Chi-Square
Note: If you are more interested in learning concepts in an Audio-Visual format, We have this entire article explained in the video below. If not, you may continue reading.
What is Chi-Square?
Higher the chi-square value more will be the purity of the nodes after a split.
Properties of chi-square
- Chi-square just like Gini impurity works only with categorical variables so we cannot use it for continuous targets.
- The higher the value of chi-square more the sub-nodes are different from the parent node and hence the homogeneity is more.
Steps to Calculate Chi-Square for a split-
- First, we need to calculate the expected values for each class.
- Then we calculate the chi-square for individual nodes using this formula that we’ve seen before-
- Finally, we calculate the chi-square for split using the sum of the chi-square of each child node for that split.
If you are looking to kick start your Data Science Journey and want every topic under one roof, your search stops here. Check out Analytics Vidhya’s Certified AI & ML BlackBelt Plus Program
If you have any queries let me know in the comment section!