Understanding Out-of-Bag (OOB) Score: Random Forest Algorithm Evaluation

Radhika Last Updated : 26 Feb, 2024

7 min read

Introduction

While trying to make a better predictive model, we come across a famous ensemble technique in machine learning algorithms, known as Random Forest in Machine Learning. The Random Forest algorithm comes along with the concept of Out-of-Bag Score(OOB_Score).

Random Forest, is a powerful ensemble technique for machine learning and data science, but most people tend to skip the concept of OOB_Score while learning about the algorithm and hence fail to understand the complete importance of Random forest as an ensemble method.

This blog will walk you through the OOB_Score concept with the help of examples.

Learning Objectives

Gain an understanding of the motivations behind using Random Forest algorithms in machine learning models, including their advantages over other models.
Learn about the concepts of bootstrapping and the Out-of-Bag (OOB) sample, and how they contribute to the formation and evaluation of a Random Forest.
Understand how to calculate the Out-of-Bag score, its interpretation, and its role as an internal validation mechanism in Random Forest models.
Analyze the benefits and limitations of using the Out-of-Bag score for model assessment, including how it compares to other model evaluation techniques.

Quick introduction to Random Forest
Bootstrapping and Out-of-Bag Sample(OOB) Boosting
Out-of-Bag Score (OOB_Score)
Advantages of using OOB_Score
Disadvantages of using OOB_Error rate
Key Takeaways
Frequently Asked Questions

Quick introduction to Random Forest

One of the best interpretable models used for supervised learning is Decision Trees, where the algorithm makes decisions and predict the values using an if-else condition, as shown in the example.

Though, Decision trees are easy to understand and in interpretations. One major issue with the decision tree is:

If you grow a tree to its maximum depth (the default setting), it will capture all the acute details in the training dataset.
And applying on testing data gives high error due to High Variance (overfitting of Training data)

Hence, to have the best of both worlds, that is less variance and more interpretability. The algorithm of Random Forest was introduced.

Random Forests or Random Decision Forests are an ensemble learning method for classification and regression problems that operate by constructing a multitude of independent decision trees(using bootstrapping) at training time and outputting majority prediction from all the trees as the final output.

Constructing many decision trees in a Random Forest algorithm helps the model to generalize the data pattern rather than learn the data pattern and therefore, reduce the variance (reduce overfitting).

But, how to select a training set for every new decision tree made in a Random Forest? This is where Bootstrapping kicks in!!

Bootstrapping and Out-of-Bag Sample(OOB) Boosting

We create new training sets for multiple decision trees in Random Forest using the concept of Bootstrapping, which is essentially random sampling with replacement.

Let us look at an example to understand how bootstrapping works:

Here, the main training dataset consists of five animals, and now to make different samples out of this one main training set.

Fix the sample size
Randomly choose a data point for a sample
After selection, keep it back in the main set (replacement)
Again choose a data point from the main training set for the sample and after selection, keep it back.
Perform the above steps, till we reach the specified sample size.

Note:Random forest bootstraps both data points and features while making multiple indepedent decision trees

Total number of trees in random forest, which are also called estimators, can be set using n_estimators.

Out-Of-Bag Sample

In the above example, you can observe that we repeated some animals while making the sample, and some animals did not even occur once in the sample.

Here, Sample1 does not have Rat and Cow whereas sample 3 had all the animals equal to the main training set.

While making the samples, data points were chosen randomly and with replacement, and the data points which fail to be a part of that particular sample are known as OUT-OF-BAG points.

Out-of-Bag Score (OOB_Score)

Where does OOB_Score come into the picture?? OOB_Score is a very powerful Validation Technique used especially for the Random Forest algorithm for least Variance results.

Note: While using the cross-validation technique, every validation set has already been seen or used in training by a few decision trees and hence there is a leakage of data, therefore more variance. But, OOB_Score prevents leakage and gives a better model with low variance, so we use OOB_score for validating the model.

Let’s understand OOB_Score through an example:

Here, we have a training set with 5 rows and a classification target variable of whether the animals are domestic/pet?

In the random forest, we build multiple decision trees. Below, we show a bootstrapped sample for one particular decision tree, say DT_1.

Here, Rat and Cat data have been left out. And since, Rat and Cat are OOB for DT_1, we would predict the values for Rat and Cat using DT_1. (Note: Data of Rat and Cat hasn’t been seen by DT_1 while training the tree.)

Just like DT_1, there would be many more decision trees where either rat or cat was left out or maybe both of them were left out.

Let’s say that the 3rd, 7th, and 100th decision trees have ‘Rat’ as an OOB datapoint. This means that none of them saw the ‘Rat’ data before predicting the value for ‘Rat’.

So, we recorded all the predicted values for “Rat” from the trees DT_1, Dt_3, DT_7, and DT_100.

And saw that aggregated/majority prediction is the same as the actual value for “Rat”.
(To Note: None of the models had seen data before, and still predicted the values for a data point correctly)

Similarly, every data point is passed for prediction to trees where it would be behaving as OOB and an aggregated prediction is recorded for each row.

The OOB_score is computed as the number of correctly predicted rows from the out-of-bag sample.

And

OOB Error is the number of wrongly classifying the OOB Sample.

Advantages of using OOB_Score

No leakage of data: Since you validate the model on the OOB Sample in Python, which means you haven’t used the data in any way while training the model, there isn’t any leakage of data and this ensures a better predictive model.
Less Variance : [More Variance ~ Overfitting due to more training score and less testing score]. Since OOB_Score generalization ensures no leakage, so there is no over-fitting of the data and hence least variance.
Better Predictive Model: OOB_Score helps in the least variance and hence it makes a much better predictive model than a model using other validation techniques.
Less Computation: It requires less computation as it allows one to test the data as it is being trained.

Disadvantages of using OOB_Error rate

Time Consuming: You can test the data as you train it using this method, but it is a bit more time-consuming compared to other validation techniques.
Not good for Large Datasets: As the process can be a bit time-consuming in comparison with the other techniques, so if the data size is huge, it may take a lot more time while training the model.
Best for Small and medium-size datasets: Even if the process is time-consuming, you should prefer OOB_Score over other techniques for a much better predictive model, especially if the dataset is medium or small-sized.

Key Takeaways

The Out-of-Bag score serves as a reliable validation score, offering an insight into the model’s prediction error without the need for a separate validation dataset.
By bypassing the necessity for a distinct validation dataset or test set, the Out-of-Bag error provides an efficient means to estimate the prediction error, streamlining the model evaluation process.
The accuracy of the Out-of-Bag validation score highlights its effectiveness in reflecting the prediction error, making it a valuable tool for assessing model performance in the absence of an external validation dataset.

Conclusion

Random Forest can be a very powerful technique for predicting better values if we use the OOB_Score technique.Even though you spend a bit more time training the random forest model with the OOB_Score parameter set as True, the predictions justify the time consumed.

Frequently Asked Questions

Q1. What is the out-of-bag error in the random forest?

A. The out-of-bag error is a performance metric that estimates the performance of the Random Forest model using samples not included in the bootstrap sample for training.

Q2. How does bagging avoid overfitting in Random Forest classification?

A. In Random Forest classification, bagging, or bootstrap aggregation, combines predictions from multiple decision trees to reduce variance and avoid overfitting. By using different subsets of the training data (via sklearn’s RandomForestClassifier), it ensures that individual models generalize better. The model enhances its overall performance by making the final prediction based on a majority vote.

Q3. Which of the following data sets is used to calculate the OOB error?

A. In a Random Forest model, each tree within the ensemble calculates the Out-of-Bag (OOB) error using the data samples it did not select for training during the bootstrap sampling process. These samples, referred to as “out-of-bag” samples, are the ones left out for each tree.

Radhika

Beginner Machine Learning

Free Courses

4.6

Exploratory Data Analysis with Python & GenAI

Learn EDA with Python: Transform data into insights using PandasAI & more.

4.5

Data Science Course

Build a powerful 2026-ready data science resume using AI tools.

4.5

No Code Predictive Analytics with Orange

No-code AI course for business pros with real-world ML use cases.

4.7

Adaptive Email Agents with DSPy

Build adaptive email agents with DSPy using context and smart learning.

4.9

Introduction to AI & ML

AI & ML are transforming industries. Learn their impacts in this course.

Reading list

Understanding Out-of-Bag (OOB) Score: Random Forest Algorithm Evaluation