This article was published as a part of the Data Science Blogathon.

Stacking is one of the most used and best-performing ensemble techniques used in the field of machine learning. It is very similar to the voting ensembles but also assigns the weights to the machine learning algorithms, where two layers of models are present: ground models and meta models. Due to this, Stacking tends to perform best of all the other ensemble techniques used in machine learning.

In this article, we will be talking about stacking ensemble techniques. We will start by talking about the core intuition of these techniques, followed by the mathematics and different approach behind their working mechanism. We will develop the code for these techniques to apply these algorithms to data, concluding with the key takeaways from this strategy.

We mentioned above that Stacking is very similar to voting. In a voting ensemble, multiple machine-learning algorithms perform the same task. Here, we train multiple machine learning algorithms on the same data, and the results from each model are taken once you’ve had training. The final output is the mean of the ground models results if the regression problem or the most frequent category of classification problem, where all the results from ground models will have the same weightage.

In stacking, the same thing takes place. Just a new layer of the model is taken into the interpretation. In Stacking, multiple machine learning algorithms are used as the ground models, but here there is also a further layer of the model called meta models. This model will assign different weights to the ground models, unlike voting ensembles, and then the prediction task is performed in stacking.

Suppose we have dataset D, and we have three machine learning ground models: Decision Tree, Random Forest, KNN, and the XGBoost, and the meta-model in the second layer is the Logistic Regression. Now we will feed the dataset D to every respective ground model. We will train the ground models on the same dataset, and one trained will be able to predict for our test dataset. Once we introduce the ground models, we will take the prediction data of every ground model and use that data to train the meta-model Decision tree. So here, the training data for the decision tree will be different. Once we introduce the meta-models, it will assign the weights to the ground models, and the output from the meta-models will be considered the final output of the stacking algorithm.

As we know, in Stacking, ground models are trained on the dataset, where we use the outputs of the ground models on the test data as the training data for the meta-model. Here we can see the same data is used multiple times by the model, meaning that the output data from ground models are already open to the whole model and are again used for the training of the meta-model. So obviously, there will be a case of over-fitting where the model will perform very well on the training data, but it will perform poorly in the i=unknown or testing data.

So here, in this case, we have tackled this problem, either we can validate the data, meaning that we will not show some samples of the data to the ground models, or we use special techniques like KNN sampling to perform the task. In stacking, we use the K-fold approach for handling these problems related to overfitting.

As we know, there Is a potential overfitting in these ensembles, and we can use the K-fold approach to tackle this problem. This approach is also called Stacking or classical Stacking. Let’s try to understand the K fold sampling with an example.

Suppose we have a regression dataset where the output column is in the form of a numerical value. So in the K fold sampling, the step would be to split the dataset into training and testing sets. So here, using the train_test_split module, the dataset can be easily divided into the training and testing set. Suppose we have a split of 80 – 20 in our dataset.

In the second step, we decide the value of K, which is the value known as the equal split of the data. Generally, we take the value of K as 5, which means that we will split the dataset into five parts.

In the 3rd step, we will train the ground models one by one on the dataset; since we have five equal splits of the dataset, we will use four splits as the training set, and the last split will be utilized as the testing set. Once trained, we will record the prediction on the earlier split for all the algorithms. The same thing will repeat, and we will record the output of the 1st, 2nd, 3rd, and 4th ground models.

In the 4th step, where we will have the prediction dataset for all the ground models, we will use that data as the training set for the meta-models. Once the meta-model is trained, now the training of the ground models, the same dataset will not be used. Instead, the training data from step 1 will be used as the training set for the ground models.

So here, we train the meta-models first, and then the ground models are introduced. Also, the problem of data leakage is solved where the meta-models are ground models trained individually, and the problem of overfitting will not occur.

Applying Stacking on data available is very simple; libraries like StackingClassifer and StackingRegresor are available in Scikit-Learn.

**Importing the required libraries:**

from sklearn.ensemble import RandomForestClassifier from sklearn.neighbors import KNeighborsClassifier from sklearn.linear_model import LogisticRegression from sklearn.ensemble import GradientBoostingClassifier from sklearn.ensemble import StackingClassifier

**Creating the estimators’ list:**

estimators = [ ('rf', RandomForestClassifier(n_estimators=10, random_state=42)), ('knn', KNeighborsClassifier(n_neighbors=10)), ('gbdt',GradientBoostingClassifier()) ]

**Applying Stacking:**

clf = StackingClassifier( estimators=estimators, final_estimator=LogisticRegression(), cv=11 ) clf.fit(X_train, y_train) y_pred = clf.predict(X_test) from sklearn.metrics import accuracy_score accuracy_score(y_test,y_pred)

In this article, we discussed the famous ensemble approach, stacking, and its core intuition if it’s with the actual example, the working mechanism, and the code example. Knowing about this strategy will help one understand the stacking algorithms better and the working of the data sampling behind them. It will also help one answer the interview question about stacking very efficiently.

Some **Key Takeaways** from this article are:

1. Stacking is a well-known ensemble approach that uses two layers of machine learning algorithms to predict the samples.

2. In Stacking, the meta-model is trained first on the output of the ground models.

3. Ground models in Stacking are trained two times on different datasets, the first training is to get outputs for validation data to feed it to the meta-model, and the second training is for the ground models themselves.

**The media shown in this article is not owned by Analytics Vidhya and is used at the Authorâ€™s discretion.**

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Become a full stack data scientist
##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

Understanding Cost Function
Understanding Gradient Descent
Math Behind Gradient Descent
Assumptions of Linear Regression
Implement Linear Regression from Scratch
Train Linear Regression in Python
Implementing Linear Regression in R
Diagnosing Residual Plots in Linear Regression Models
Generalized Linear Models
Introduction to Logistic Regression
Odds Ratio
Implementing Logistic Regression from Scratch
Introduction to Scikit-learn in Python
Train Logistic Regression in python
Multiclass using Logistic Regression
How to use Multinomial and Ordinal Logistic Regression in R ?
Challenges with Linear Regression
Introduction to Regularisation
Implementing Regularisation
Ridge Regression
Lasso Regression

Introduction to Stacking
Implementing Stacking
Variants of Stacking
Implementing Variants of Stacking
Introduction to Blending
Bootstrap Sampling
Introduction to Random Sampling
Hyper-parameters of Random Forest
Implementing Random Forest
Out-of-Bag (OOB) Score in the Random Forest
IPL Team Win Prediction Project Using Machine Learning
Introduction to Boosting
Gradient Boosting Algorithm
Math behind GBM
Implementing GBM in python
Regularized Greedy Forests
Extreme Gradient Boosting
Implementing XGBM in python
Tuning Hyperparameters of XGBoost in Python
Implement XGBM in R/H2O
Adaptive Boosting
Implementing Adaptive Boosing
LightGBM
Implementing LightGBM in Python
Catboost
Implementing Catboost in Python

Introduction to Clustering
Applications of Clustering
Evaluation Metrics for Clustering
Understanding K-Means
Implementation of K-Means in Python
Implementation of K-Means in R
Choosing Right Value for K
Profiling Market Segments using K-Means Clustering
Hierarchical Clustering
Implementation of Hierarchial Clustering
DBSCAN
Defining Similarity between clusters
Build Better and Accurate Clusters with Gaussian Mixture Models

Introduction to Machine Learning Interpretability
Framework and Interpretable Models
model Agnostic Methods for Interpretability
Implementing Interpretable Model
Understanding SHAP
Out-of-Core ML
Introduction to Interpretable Machine Learning Models
Model Agnostic Methods for Interpretability
Game Theory & Shapley Values

Deploying Machine Learning Model using Streamlit
Deploying ML Models in Docker
Deploy Using Streamlit
Deploy on Heroku
Deploy Using Netlify
Introduction to Amazon Sagemaker
Setting up Amazon SageMaker
Using SageMaker Endpoint to Generate Inference
Deploy on Microsoft Azure Cloud
Introduction to Flask for Model
Deploying ML model using Flask