In this article, we’ll see what is Random Forest algorithm and also what are its hyper-parameters? Random forest is one of the most commonly used algorithms that work on the concept of bagging.

*Note: If you are more interested in learning concepts in an Audio-Visual format, We have this entire article explained in the video below. If not, you may continue reading.*

So let’s understand how the algorithm works. From a given data set, multiple bootstrap samples are created and the number of bootstrap samples would depend on the number of models we want to train. Suppose I want to build 10 models here then I’ll create 10 bootstrap samples-

Now on each of these bootstrap samples, we built a decision tree model. So here, as you can see, if we have 10 decision tree models, each built on a different subset of data-

Now each of these individual decision trees would generate a prediction, which is then finally combined to get the final output or the final prediction-

So effectively we’re combining multiple trees to get the final output. And hence it’s called the forest. But why is it called **Random Forest**? You might say it’s because we use the random bootstrap samples. Well, that’s partially correct, along with a random sampling of data points or rows, the random forest also performs random sampling of features.

Note that the random sampling of rows is done at the tree level. So every tree will get a different subset of data points. Feature sampling is done on the node level or on the split level and not on the tree level.

So let’s dive deeper into this concept of feature sampling for the random forest. Let’s take an example. Suppose we have nine features in the data set from V1 to V9-

Now for a simple decision tree model, we use all of these features to build the tree. So all the nine features are used at the first node to decide the best split-

And then again, all the nine features are used at the two leaf nodes to decide the best split-

but for a tree built-in random forest, a subset of features is selected at every node. So let’s say we have the first subsidy as follows V1 V3, V4, and V9-

Then these features will be used at the root node to decide the best split. And for the next node, we’ll create another subset of features, which is used for this further split-

And similarly, for the other node here, we’ll create a new subset of features, which will be used to make the split. And this process goes on for every node in every tree.

So if we are building m trees, for each node in each tree, you select a subset of features and then choose the best split. Let’s have a quick recap of what we covered in random forest-

- So first we create multiple bootstrap samples and the number of samples would depend on the number of models we want to build.
- Then on each of these bootstrap samples, we train a decision tree model during the building of these decision trees. We perform feature sampling at every split.
- And finally, we aggregate all the decision trees to get the final output.

First, understand what the term hyper-parameters means? We have seen that there are multiple factors that can be used to define the random forest model. For instance, the maximum number of features used to split a node or the number of trees in the forest. We can manually set and tune these values. So I can set the number of trees to be 20 or 200 or 500. We can manually change and update these values. And these are called the hyper-parameters of random forest.

Let us see what are hyperparameters that we can tune in the random forest model. As we have already discussed a random forest has multiple trees and we can set the number of trees we need in the random forest. This is done using a hyperparameter “**n_estimators**”

There are multiple other hyper-parameters that we can set at a tree level. So consider this one tree in the forest-

Can you think of hyper-parameters for this tree? Firstly, if we can set the number of features to take into account at each node for splitting, for example-

If the total number of features in my data set is a hundred and I decide that I’ll randomly select the square root of these total number of features. So which comes out to be 10. So I’ll randomly select 10 features for this node and the best feature will be used to make the split. Then again, I’ll randomly select 10 features for another node. And similarly, for all the nodes, instead of taking a square root, I can also consider taking the log of the total number of features in the dataset. So “**max_features**” is one of the parameters that we can tune to randomly select the number of features at each node.

Another hyperparameter could be the depth of the tree. For example, in this given tree here,

we have level one, we have level two, and a level three. So the depth of the tree, in this case, is three. We can decide the depth to which a tree can grow using “**max_depth**“, and this can be considered as one of the stopping criteria that restrict the growth of the tree.

Another value that we can control is the minimum number of samples required to split a node and the minimum number of samples at the leaf node. So for instance, if I have a thousand data points and after the first split, I have 70o in one node and 300 in the other node. And here are the values of the number of data points in each of the node after splitting-

Now we can set the criteria that a normal split only when the number of samples in the node is more than 100 using “**min_samples_split**“. So node with 80 samples will not have for the split, same as the case with a node with 100 and 75 samples since our condition is that the number of samples should be more than 100.

Another condition that we can set is the number of samples in the leaf node using “**min_samples_leaf**“. So if I save that the split can happen only when after the split, the number of samples in the leaf node comes out to be more than 85. Then, in that case, this will be an invalid split since, after the split, the number of samples is only 80-

The split on 300 and 620 are valid but the split on 200 will not take place-

Another important hyper-parameter is “**criteria**“. While deciding a split in decision trees, we have several criteria such as Gini impurity, information gain, chi-square, etc. Similarly, we can have some other criteria like entropy. So to set, which criteria to use will be done using the hyperparameter “criteria”.

So these were the most important hyper-parameters of random forest.

In this article, we learned about the random forest algorithm and some of its important parameters which you should know before diving further into the details of this algorithm.

*If you are looking to kick start your Data Science Journey and want every topic under one roof, your search stops here. Check out Analytics Vidhya’s Certified AI & ML BlackBelt Plus Program*

If you have any queries, let me know in the comment section!

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Become a full stack data scientist##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

Understanding Cost Function
Understanding Gradient Descent
Math Behind Gradient Descent
Assumptions of Linear Regression
Implement Linear Regression from Scratch
Train Linear Regression in Python
Implementing Linear Regression in R
Diagnosing Residual Plots in Linear Regression Models
Generalized Linear Models
Introduction to Logistic Regression
Odds Ratio
Implementing Logistic Regression from Scratch
Introduction to Scikit-learn in Python
Train Logistic Regression in python
Multiclass using Logistic Regression
How to use Multinomial and Ordinal Logistic Regression in R ?
Challenges with Linear Regression
Introduction to Regularisation
Implementing Regularisation
Ridge Regression
Lasso Regression

Introduction to Stacking
Implementing Stacking
Variants of Stacking
Implementing Variants of Stacking
Introduction to Blending
Bootstrap Sampling
Introduction to Random Sampling
Hyper-parameters of Random Forest
Implementing Random Forest
Out-of-Bag (OOB) Score in the Random Forest
IPL Team Win Prediction Project Using Machine Learning
Introduction to Boosting
Gradient Boosting Algorithm
Math behind GBM
Implementing GBM in python
Regularized Greedy Forests
Extreme Gradient Boosting
Implementing XGBM in python
Tuning Hyperparameters of XGBoost in Python
Implement XGBM in R/H2O
Adaptive Boosting
Implementing Adaptive Boosing
LightGBM
Implementing LightGBM in Python
Catboost
Implementing Catboost in Python

Introduction to Clustering
Applications of Clustering
Evaluation Metrics for Clustering
Understanding K-Means
Implementation of K-Means in Python
Implementation of K-Means in R
Choosing Right Value for K
Profiling Market Segments using K-Means Clustering
Hierarchical Clustering
Implementation of Hierarchial Clustering
DBSCAN
Defining Similarity between clusters
Build Better and Accurate Clusters with Gaussian Mixture Models

Introduction to Machine Learning Interpretability
Framework and Interpretable Models
model Agnostic Methods for Interpretability
Implementing Interpretable Model
Understanding SHAP
Out-of-Core ML
Introduction to Interpretable Machine Learning Models
Model Agnostic Methods for Interpretability
Game Theory & Shapley Values

Deploying Machine Learning Model using Streamlit
Deploying ML Models in Docker
Deploy Using Streamlit
Deploy on Heroku
Deploy Using Netlify
Introduction to Amazon Sagemaker
Setting up Amazon SageMaker
Using SageMaker Endpoint to Generate Inference
Deploy on Microsoft Azure Cloud
Introduction to Flask for Model
Deploying ML model using Flask

[…] Introduction to Random Forest and its Hyper-parameters […]