# How to select Best Split in Decision Trees using Chi-Square

## Introduction

**Chi-square.**

*Note: If you are more interested in learning concepts in an Audio-Visual format, We have this entire article explained in the video below. If not, you may continue reading.*

## What is Chi-Square?

**zero**because both the actual and expected are the same and the difference will be zero.

Higher the chi-square value more will be the purity of the nodes after a split.

## Properties of chi-square

- Chi-square just like Gini impurity works only with categorical variables so we cannot use it for continuous targets.
- The higher the value of chi-square more the sub-nodes are different from the parent node and hence the homogeneity is more.

## Steps to Calculate Chi-Square for a split-

- First, we need to calculate the expected values for each class.
- Then we calculate the chi-square for individual nodes using this formula that we’ve seen before-

- Finally, we calculate the chi-square for split using the sum of the chi-square of each child node for that split.

**Above average Chi-Square(Play) = √ [(1)² / 7] = √ 0.1428 ≈ 0.38**

**Below average Chi-Square(Play) = √ [(-1)² / 3] = √ 0.3333 ≈ 0.58**

**0.38**for the above-average node and

**0.58**for the below-average node.

**1.9**and this is the chi-square value for the split on “performance in class”.

**5.36.**So what do you think we should do next? We will compare the two chi-square values and see which one is higher-

## End Notes

*If you are looking to kick start your Data Science Journey and want every topic under one roof, your search stops here. Check out Analytics Vidhya’s Certified AI & ML BlackBelt Plus Program*

If you have any queries let me know in the comment section!