4 Simple Ways to Split a Decision Tree in Machine Learning (Updated 2024)

Abhishek Sharma Last Updated : 08 Nov, 2024
8 min read

A decision tree is a powerful machine learning algorithm extensively used in the field of data science. They are simple to implement and equally easy to interpret. It also serves as the building block for other widely used and complicated machine-learning algorithms like Random Forest, XGBoost, and LightGBM. I often lean on the decision tree algorithm as my go-to machine learning algorithm, whether I’m starting a new project or competing in a hackathon. In this article, I will explain 4 simple methods for split a decision tree.

decision tree split


Learning Objectives

  • Learn how to split a decision tree based on different splitting criteria.
  • Get acquainted with the Reduction in Variance, Gini Impurity, Information Gain, and Chi-square in decision trees.
  • Know the difference between these different methods of splitting.

I assume familiarity with the basic concepts in regression and decision trees. Here are two free and popular courses to quickly learn or brush up on the key concepts:

Basic Decision Tree Terminologies

Let’s quickly go through some of the key terminologies related to Split a decision tree which we’ll be using throughout this article.

split decision tree
  • Parent and Child Node: When a node divides into sub-nodes, it becomes the Parent Node, and the sub-nodes become Child Nodes. A node can divide into multiple sub-nodes, acting as the parent of several child nodes.
  • Root Node: The topmost node of a decision tree. It does not have any parent node. It represents the entire population or sample.
  • Leaf / Terminal Nodes: Nodes of the tree that do not have any child node are known as Terminal/Leaf Nodes.

There are multiple tree models to choose from based on their learning technique when building a decision tree, e.g., ID3, CART, Classification and Regression Tree, C4.5, etc. Selecting which decision tree to use is based on the problem statement. For example, for classification problems, we mainly use a classification tree with a gini index to identify class labels for datasets with relatively more number of classes.

In this article, we will mainly discuss the CART tree.

What Is Node Splitting in a Decision Tree and Why Is It Done?

Modern-day programming libraries have made using any machine learning algorithm easy, but this comes at the cost of hidden implementation, which is a must-know for fully understanding an algorithm. Another reason for this infinite struggle is the availability of multiple ways to split decision tree nodes adding to further confusion.

Before learning any topic, I believe it is essential to understand why you’re learning it. That helps in understanding the goal of learning a concept. So let’s understand why to learn about node splitting in decision trees.

Since you all know that how extensively decision trees are used, there is no denying the fact that learning about decision trees is a must. A decision tree makes decisions by splitting nodes into sub-nodes. It is a supervised learning algorithm. This process is performed multiple times in a recursive manner during the training process until only homogenous nodes are left. This is why a decision tree performs so well. The process of recursive node splitting into subsets created by each sub-tree can cause overfitting. Therefore, node splitting is a key concept that everyone should know.

Node splitting, or simply splitting, divides a node into multiple sub-nodes to create relatively pure nodes. This is done by finding the best split for a node and can be done in multiple ways. The ways of splitting a node can be broadly divided into two categories based on the type of target variable:

  1. Continuous Target Variable: Reduction in Variance
  2. Categorical Target Variable: Gini Impurity, Information Gain, and Chi-Square

We’ll look at each splitting method in detail in the upcoming sections. Let’s start with the first method of splitting – reduction in variance.

Also, Read about this article about Gsplitting decision tree with Gini Impurity

Reduction in Variance in Decision Tree

Reduction in Variance is a method for splitting the node used when the target variable is continuous, i.e., regression problems. It is called so because it uses variance as a measure for deciding the feature on which a node is split into child nodes.

variance reduction in variance

Variance is used for calculating the homogeneity of a node. If a node is entirely homogeneous, then the variance is zero.

Here are the steps to split a decision tree using the reduction in variance method:

  1. For each split, individually calculate the variance of each child node
  2. Calculate the variance of each split as the weighted average variance of child nodes
  3. Select the split with the lowest variance
  4. Perform steps 1-3 until completely homogeneous nodes are achieved

The below video excellently explains the reduction in variance using an example:

Information Gain in Decision Tree

Now, what if we have a categorical target variable? For categorical variables, a reduction in variation won’t quite cut it. Well, the answer to that is Information Gain. The Information Gain method splits the nodes when the target variable is categorical. It operates based on the concept of entropy and is given by:

information gain

Entropy is used for calculating the purity of a node. The lower the value of entropy, the higher the purity of the node. The entropy of a homogeneous node is zero. Since we subtract entropy from 1, the Information Gain is higher for the purer nodes with a maximum value of 1. Now, let’s take a look at the formula for calculating the entropy:

information gain

Steps to split a decision tree using Information Gain:

  1. For each split, individually calculate the entropy of each child node
  2. Calculate the entropy of each split as the weighted average entropy of child nodes
  3. Select the split with the lowest entropy or highest information gain
  4. Until you achieve homogeneous nodes, repeat steps 1-3

Here’s a video on how to use information gain for splitting a decision tree:

Gini Impurity in Decision Tree

Gini Impurity is a method for splitting the nodes when the target variable is categorical. It is the most popular and easiest way to split a decision tree. The Gini Impurity value is:

gini impurity

Wait – what is Gini?

Gini is the probability of correctly labeling a randomly chosen element if it is randomly labeled according to the distribution of labels in the node. The formula for Gini is:

gini impurity

And Gini Impurity is:

gini impurity

The lower the Gini Impurity, the higher the homogeneity of the node. The Gini Impurity of a pure node is zero. Now, you might be thinking we already know about Information Gain then, why do we need Gini Impurity?

Gini Impurity is preferred to Information Gain because it does not contain logarithms which are computationally intensive.

Here are the steps to split a decision tree using Gini Impurity:

  1. Similar to what we did in information gain. For each split, individually calculate the Gini Impurity of each child node
  2. Calculate the Gini Impurity of each split as the weighted average Gini Impurity of child nodes
  3. Select the split with the lowest value of Gini Impurity
  4. Until you achieve homogeneous nodes, repeat steps 1-3

And here’s Gini Impurity in video form:

Chi-Square in Decision Tree

Chi-square is another method of splitting nodes in a decision tree for datasets having categorical target values. It is used to make two or more splits in a node. It works on the statistical significance of differences between the parent node and child nodes.

The Chi-Square value is:

chi square decision tree

Here, the Expected is the expected value for a class in a child node based on the distribution of classes in the parent node, and the Actual is the actual value for a class in a child node.

The above formula gives us the value of Chi-Square for a class. Take the sum of Chi-Square values for all the classes in a node to calculate the Chi-Square for that node. The higher the value, the higher will be the differences between parent and child nodes, i.e., the higher will be the homogeneity.

Here are the steps to split a decision tree using Chi-Square:

  1. For each split, individually calculate the Chi-Square value of each child node by taking the sum of Chi-Square values for each class in a node
  2. Calculate the Chi-Square value of each split as the sum of Chi-Square values for all the child nodes
  3. Select the split with a higher Chi-Square value
  4. Until you achieve homogeneous nodes, repeat steps 1-3

Here’s a video explaining Chi-Square in the context of a decision tree:

Conclusion

Decision trees are an important tool in machine learning for solving classification and regression problems. However, creating an effective decision tree requires choosing the right features and splitting the data in a way that maximizes information gain. After reading the above article, you know about different methods of splitting a decision tree.

Key Takeaways

  • In this article, we learned about how splitting in a decision tree works.
  • We got to learn about splitting in regression and classification problems.
  • There is no best method for splitting a decision tree, but there are some methods that are used more than others. It depends on your training data and the problem statement.

In the next steps, you can watch our complete playlist on decision trees on youtube. Or, you can take our free course on decision trees here.

I have also put together a list of fantastic articles on decision trees below:

Frequently Asked Questions

Q1. What is the best method for splitting a decision tree?

A. The most widely used method for splitting a decision tree is the gini index or the entropy. The default method used in sklearn is the gini index for the decision tree classifier. The scikit learn library provides all the splitting methods for classification and regression trees. You can choose from all the options based on your problem statement and dataset.

Q2. What are the advantages of a decision tree?

A. The main advantage of a decision tree is that it does not require normalization or scaling of data; hence lesser preprocessing is required for data preparation. Moreover, it is easier to understand and interpret as compared to other classification models.

Q3. How many splits are there in a decision tree?

A. For n number of classes in a decision tree, there are 2^(n -1) – 1 maximum splits.

He is a data science aficionado, who loves diving into data and generating insights from it. He is always ready for making machines to learn through code and writing technical blogs. His areas of interest include Machine Learning and Natural Language Processing still open for something new and exciting.

Responses From Readers

Clear

MANEESH KUMAR SINGH
MANEESH KUMAR SINGH

Hi, Abhishek Sharma I have one doubt with Decision Tree Splitting Method #3: Gini Impurity. You mension there Gini Impurity is a method for splitting the nodes when the target variable is continuous. is it correct? I have read below blog so i am confuse with it. https://www.analyticsvidhya.com/blog/2016/04/tree-based-algorithms-complete-tutorial-scratch-in-python/

Vasileios Anagnostopoulos
Vasileios Anagnostopoulos

Hi Abhisek, you do not mention what is a "split". For finite discrete variables, the split is the various values the variable takes or a threshold. For a continuous or infinite discrete we need a threshold. When we go with the threshold, we need to find the optimal threshold. How is it done? Line scan? Any other clever way?

Vani Kapoor
Vani Kapoor

Hi Abhishek, What criteria does C4.5 use for decision tree building. Is it information gain, like ID3.

Collins
Collins

Dear Abhishek, well done on this write-up. Besides, the Reduction in variance method which other methods can be used for splitting the node used when the target variable is continuous?

Flash Card

What is a Decision Tree?

A decision tree is a super popular and powerful machine learning algorithm used a lot in data science. It's easy to use, simple to understand, and great for seeing how decisions are made. Think of it like a flowchart where a complex problem gets broken down into a series of simple yes/no questions, creating a tree-like structure that shows the steps and outcomes.

Each branch of the tree is a decision path, and each leaf is the final answer or prediction. This makes it really easy to follow and see exactly how the model makes its choices.

What’s cool is that decision trees are the building blocks for more advanced algorithms like Random Forest, XGBoost, and LightGBM. These models use a bunch of decision trees together to make even more accurate and reliable predictions.

What is a Decision Tree?

Quiz

Why are decision trees popular in machine learning?

Flash Card

What are the basic terminologies associated with decision trees?

Parent and Child Node: When a node divides into sub-nodes, it becomes the Parent Node, and the sub-nodes are called Child Nodes. Root Node: The topmost node of a decision tree, representing the entire population or sample. Leaf/Terminal Nodes: Nodes that do not have any child nodes. Different tree models like ID3, CART, and C4.5 are used based on the problem statement.

What are the basic terminologies associated with decision trees?

Quiz

Which of the following is NOT a basic terminology associated with decision trees?

Flash Card

Why is node splitting crucial in decision trees?

Node splitting divides a node into multiple sub-nodes to create relatively pure nodes. It is essential for understanding how decision trees make decisions and why they perform well. The process is recursive and continues until only homogeneous nodes are left, which can lead to overfitting if not managed properly.

Quiz

Why is node splitting an essential process in decision trees?

Flash Card

How does the Reduction in Variance method work for node splitting?

Used when the target variable is continuous, suitable for regression problems. Variance is used to measure the homogeneity of a node; a homogeneous node has zero variance. Steps include calculating variance for each child node, selecting the split with the lowest variance, and repeating until nodes are homogeneous.

How does the Reduction in Variance method work for node splitting?

Quiz

What is the primary purpose of the Reduction in Variance method in decision trees?

Flash Card

What is Information Gain and how is it used in decision trees?

Information Gain is used for splitting nodes when the target variable is categorical. It operates based on entropy, which measures the purity of a node. Steps involve calculating entropy for each child node, selecting the split with the lowest entropy or highest information gain, and repeating until nodes are homogeneous.

What is Information Gain and how is it used in decision trees?

Quiz

How is Information Gain utilized in decision trees?

Flash Card

Explain the Gini Impurity method for node splitting.

Gini Impurity is used for categorical target variables and is the most popular method. It measures the probability of correctly labeling a randomly chosen element. Steps include calculating Gini Impurity for each child node, selecting the split with the lowest Gini Impurity, and repeating until nodes are homogeneous.

Explain the Gini Impurity method for node splitting.

Quiz

What does the Gini Impurity method measure in decision trees?

Flash Card

How does the Chi-Square method work for node splitting in decision trees?

Chi-Square is used for datasets with categorical target values to make two or more splits in a node. It assesses the statistical significance of differences between the parent node and child nodes. Steps involve calculating Chi-Square values for each child node, selecting the split with the highest Chi-Square value, and repeating until nodes are homogeneous.

How does the Chi-Square method work for node splitting in decision trees?

Quiz

What is the primary use of the Chi-Square method in decision trees?

Flash Card

What are the key takeaways from understanding decision tree splitting methods?

Decision trees are crucial for solving classification and regression problems. Choosing the right features and splitting methods is essential for maximizing information gain. There is no single best method for splitting; it depends on the training data and problem statement.

Quiz

What is a key takeaway regarding decision tree splitting methods?

Congratulations, You Did It!
Well Done on Completing Your Learning Journey. Stay curious and keep exploring!

We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & Cookies Policy.

Show details