Encountering ensembleÂ learning algorithm in winningÂ solutions of data science competitions has become a norm now. The ability to train multiple learners on a set of hypothesis, not only adds robustness to the model, but also enables it to deliver highly accurate predictions. In case you missed, I would recommend readingÂ Basics of Ensemble Learning Explained in Simple EnglishÂ before you go forward.

While building ensemble models,Â one of most common challenge that people face is to find optimal weights.Â Few of them fight hard to solve this challenge and the not soÂ brave ones, convince themselves to apply simple bagging, which assumes equal weight for all models and takes the average of all predicted values.

It often works well because it removes variance error from individual models. However, as we know, assigningÂ equal weights is not the best approach to obtain theÂ best combination of models. What could be an alternate way?Â **In this article, I’ve solved the problemÂ of finding out the optimal weights in ensemble learners using neural network** in R Programming.

Imagine you have built 3 models on a given (hypothetical) data set. Each of these models predicts probability of output for an event.

In the following figure : Model 1, Model 2 and Model 3 are the three predictive models. Each of them have their own uniqueness and can work as a team to perform even better than individually best model.

We initially assume a 33.33% weight for each of the model and build an Ensemble model. Here, the challenge is to optimize these weights w1, w2 and w3 in such a fashion as to build a highly powerful ensemble model.

Assume p1 , p2 and p3 are three outputs from the three models respectively. We need to optimize w1, w2 and w3 to optimize an objective function. Let’s try to write down the constraints and objective function mathematically :

**Constraint :**

w1 + w2 + w3 = 1

p = w1 * p1 + w2 * p2 + w3 * p3

**Objective function (Maximize Likelihood function) :**

Maximize (Product over all observations [Â (p) ^ (y) * (1-p) ^ (1-y)Â ]Â )

where y is the response flag for the observation p is the predicted probability through ensemble model p1, p2 and p3 : predicted probabilities from individual model w1, w2 and w3 : weights given to each model

This is a classic case of simplex optimization. However, with a huge number of models to bag, and diving into mathematical formulas every time can be stressful and time consuming at times. Hence,Â we need a smarter method here.

We’ll now learn to findÂ these weights without getting into such mathematical formulation using neural network implementation.

I understand,Â neural network implementation can be quite overwhelming at times. Hence, to solve theÂ current case in hand, we’llÂ not deep dive into the complex concepts ofÂ deep neural network.

Basically, neural network operates on finding weights for interlink between ( input variable toÂ hidden node) and ( hidden node to output node). Our objective here is toÂ find the right combination of weights from input nodes directly to the output node.

Here is an easy and diligentÂ wayÂ to accomplish this task. We restrict the number of hidden nodes to one. This will automatically adjust the weight from hidden node to output node as 1. This implies thatÂ the hidden node and the output node is no longer different.

Here is a simplistic schema to represent what we just discussed :

So we found a way! This will automatically accomplishÂ what we were trying to do using simplex equations.Â Let’s put this formulation down to a R code

# x is a matrix of multiple predictions coming from different learners #y is a vector of all output flags #x_test is a matrix of multiple predictions on an unseen sample x <- as.matrix(prediction_table) y <- as.numeric(output_flags) nn <- dbn.dnn.train(x,y,hidden = c(1), activationfun = "sigm",learningrate = 0.2,momentum = 0.8) nn_predict <- n.predict(nn,x) nn_predict_test <- n.predict(nn,x_test)

I have implementedÂ this logic on multiple Kaggle problems and found that the results are quite encouraging. Here is one such example from a recent competition:

- Simple logisticÂ model at score of 100
- Best machine learningÂ model at score of 107
- Simple bagging techniques of 110, and finally
- Optimized weights using neural network of 113.

In this article, I’ve explained the method to find optimal weight in an ensemble model using a traditional approach and neural network implementation (recommended).

Did you find this article useful? Have you tried anything else to find optimal weights? I’ll be happy to hear from you in the comments section below.

Lorem ipsum dolor sit amet, consectetur adipiscing elit,