Tuning the Hyperparameters and Layers of Neural Network Deep Learning

Rendyk 13 Jun, 2024
15 min read

Introduction

In my previous discussion, I delved into hyperparameter tuning using Bayesian Optimization with tools like bayes_opt or hyperopt. This effective method is versatile, suitable for optimizing hyperparameters in various classification and regression Machine Learning algorithms designed for tabular data. Nevertheless, when it comes to Neural Network Deep Learning, the process of fine-tuning neural network hyperparameters, including the layers, follows a slightly different approach

This article was published as a part of the Data Science Blogathon

Learning Objectives:

  • In this Article, you will get to know about Neural Networks in Deep learning.
  • Also, you will learn the Process of fine-tuning neural network in hyperparameters in deep learning.

Neural Network Hyperparameters (Deep Learning)

Neural network CNN hyperparameter tuning are like settings you choose before teaching a neural network to do a task. They control things like how many layers the network has, how quickly it learns, and how it adjusts its internal values. Picking the right hyperparameters in deep learning is important to help the network learn effectively and solve the task accurately. It’s a bit like adjusting the knobs on a machine to make it work just right for a particular job.

Neural Network is a Deep Learning technic to build a model according to training data to predict unseen data using many layers consisting of neurons. This is similar to other Machine Learning algorithms, except for the use of multiple layers. The use of multiple layers is what makes it Deep Learning.

Instead of directly building Machine Learning in 1 line, Neural Network requires users to build the architecture before compiling them into a model. Users will have to arrange how many layers and how many nodes or neurons to build. This is not found in other conventional Machine Learning algorithms.

I am sure that it is easy to find tutorials on the neural network on the internet. There are also many blogs explaining the concept behind a neural network. The code to perform hyperparameter-tuning to a neural network also can be found in many articles and shared notebooks. But, I feel it is quite rare to find a guide of neural network hyperparameter-tuning using Bayesian Optimization. The articles I found mostly depend on GridSearchCV or RandomizedSearchCV. Meanwhile, a neural network has many hyperparameters to tune. Bayesian optimization is more efficient in time and memory capacity for tuning many hyperparameters. I have described the reason in my past article.

Also Read: A Comprehensive Guide on Hyperparameter Tuning and its Techniques

Different DataSets of Hyperparameters

Different datasets require different sets of hyperparameters to predict accurately. But, the large number of hyperparameters makes users difficult to decided which one to choose. There is no answer to how many layers are the most suitable, how many neurons are the best, or which optimizer suits the best for all datasets. Hyperparameter-tuning is important to find the possible best sets of hyperparameters to build the model from a specific dataset.

In this article, I will demonstrate the process to tune 2 things of Neural Network: (1) the hyperparameters and (2) the layers. I find it more difficult to find the latter tutorials than the former. The first one is the same as other conventional Machine Learning algorithms. The hyperparameters in deep learning to tune are the number of neurons, activation function, optimizer, learning rate, batch size, and epochs. The second step is to tune the number of layers. This is what other conventional algorithms do not have. Different layers can affect the accuracy. Fewer layers may give an underfitting result while too many layers may make it overfitting.

For the hyperparameter-tuning demonstration, I use a dataset provided by Kaggle. I build a simple Multilayer Perceptron (MLP) neural network to do a binary classification task with prediction probability. The used package in Python is Keras built on top of Tensorflow. The dataset has an input dimension of 10. There are two hidden layers, followed by one output layer. The accuracy metric is the accuracy score. The callback of EarlyStopping is used to stop the learning process if there is no accuracy improvement in 20 epochs. Below is the illustration.

Deep Learning hyperparameter,Neural network hyperparameters

Fig. 1 MLP Neural Network to build. Source: created by myself

Hyperparameters in Neural Networks Tuning in Deep Learning

When delving into the optimization of neural network hyperparameters, the initial focus lies on tuning the number of neurons in each hidden layer. Currently, all layers share the same number of neurons, but customization is possible. It’s crucial to adapt the number of neurons based on the complexity of the solution. Tasks with higher complexity demand an increased number of neurons. The specified range for the number of neurons spans from 10 to 100, offering flexibility in fine-tuning neural network CNN hyperparameter tuning to suit varying solution complexities.

An activation function is a parameter in each layer. Input data are fed to the input layer, followed by hidden layers, and the final output layer. The output layer contains the output value. The input values moving from a layer to another layer keep changing according to the activation function. The activation function decides how to compute the input values of a layer into output values. The output values of a layer are then passed to the next layer as input values again. The next layer then computes the values into output values for another layer again. There are 9 activation functions to tune in to this demonstration. Each activation function has its own formula (and graph) to compute the input values. It will not be discussed in this article.

The layers of a neural network are compiled and an optimizer is assigned. The optimizer is responsible to change the learning rate and weights of neurons in the neural network to reach the minimum loss function. Optimizer is very important to achieve the possible highest accuracy or minimum loss. There are 7 optimizers to choose from. Each has a different concept behind it.

Hyperparameter Resources

One of the hyperparameters in deep learning in the optimizer is the learning rate. We will also tune the learning rate. Learning rate controls the step size for a model to reach the minimum loss function. A higher learning rate makes the model learn faster, but it may miss the minimum loss function and only reach the surrounding of it. A lower learning rate gives a better chance to find a minimum loss function. As a tradeoff lower learning rate needs higher epochs, or more time and memory capacity resources.

Deep Learning learning rate, hyperparameter in neural networks

Fig 2. Learning rate illustration. Source: created by myself

When dealing with large training datasets, building a model can be time-consuming. To expedite the learning process, we can optimize hyperparameters in neural networks, such as the batch size. By assigning a batch size, not all training data are fed to the model simultaneously. For instance, with a dataset of 77,500 observations and a batch size of 1000, the model undergoes 77 iterations with 1000 training data sub-samples and a final iteration with the remaining 500 sub-samples. A smaller batch size accelerates learning but may increase variance in validation dataset accuracy. Conversely, a larger batch size slows learning while stabilizing validation dataset accuracy variance.

Number of times dataset

The number of times a complete dataset passes through the neural network model is referred to as an epoch. Essentially, one epoch involves the training dataset moving forward and backward through the neural network once. If the number of epochs is too small, it may result in underfitting, indicating that the neural network hasn’t learned sufficiently. Multiple passes or epochs are necessary for effective learning. Conversely, excessive epochs can lead to overfitting, where the model excels in predicting existing data but struggles with new, unseen data. Tuning the number of epochs is crucial for optimal results. In this demonstration, we aim to find the ideal number of epochs within the range of 20 to 100, emphasizing the importance of hyperparameters in deep learning in neural networks

Below is the code to tune the hyperparameters of a neural network as described above using Bayesian Optimization. The tuning searches for the optimum CNN hyperparameter tuning based on 5-fold cross-validation. The following code imports useful packages for Neural Network modeling.

# Import packages
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from keras.models import Sequential
from keras.layers import Dense, BatchNormalization, Dropout
from keras.optimizers import Adam, SGD, RMSprop, Adadelta, Adagrad, Adamax, Nadam, Ftrl
from keras.callbacks import EarlyStopping, ModelCheckpoint
from keras.wrappers.scikit_learn import KerasClassifier
from math import floor
from sklearn.metrics import make_scorer, accuracy_score
from bayes_opt import BayesianOptimization
from sklearn.model_selection import StratifiedKFold
from keras.layers import LeakyReLU
LeakyReLU = LeakyReLU(alpha=0.1)
import warnings
warnings.filterwarnings('ignore')
pd.set_option("display.max_columns", None)

This code makes accuracy the scorer metric.

# Make scorer accuracy
score_acc = make_scorer(accuracy_score)

This code loads training and test datasets. It then splits the dataset into another training dataset and validation dataset. The validation dataset is 20% of the total dataset. The dataset is split according to the target variable.
# Load dataset
trainSet = pd.read_csv('../input/tabular-playground-series-apr-2021/train.csv')
# Feature generation: training data
train = trainSet.drop(columns=['Name', 'Ticket', 'Cabin'])
train = train.dropna(axis=0)
train = pd.get_dummies(train)
# train validation split
X_train, X_val, y_train, y_val = train_test_split(train.drop(columns=['PassengerId','Survived'], axis=0),
                                                  train['Survived'],
                                                  test_size=0.2, random_state=111,
                                                  stratify=train['Survived'])

The following code creates the objective function containing the Neural Network model. The function will return returns the score of the cross-validation.

# Create function
def nn_cl_bo(neurons, activation, optimizer, learning_rate,  batch_size, epochs ):
    optimizerL = ['SGD', 'Adam', 'RMSprop', 'Adadelta', 'Adagrad', 'Adamax', 'Nadam', 'Ftrl','SGD']
    optimizerD= {'Adam':Adam(lr=learning_rate), 'SGD':SGD(lr=learning_rate),
                 'RMSprop':RMSprop(lr=learning_rate), 'Adadelta':Adadelta(lr=learning_rate),
                 'Adagrad':Adagrad(lr=learning_rate), 'Adamax':Adamax(lr=learning_rate),
                 'Nadam':Nadam(lr=learning_rate), 'Ftrl':Ftrl(lr=learning_rate)}
    activationL = ['relu', 'sigmoid', 'softplus', 'softsign', 'tanh', 'selu',
                   'elu', 'exponential', LeakyReLU,'relu']
    neurons = round(neurons)
    activation = activationL[round(activation)]
    batch_size = round(batch_size)
    epochs = round(epochs)
    def nn_cl_fun():
        opt = Adam(lr = learning_rate)
        nn = Sequential()
        nn.add(Dense(neurons, input_dim=10, activation=activation))
        nn.add(Dense(neurons, activation=activation))
        nn.add(Dense(1, activation='sigmoid'))
        nn.compile(loss='binary_crossentropy', optimizer=opt, metrics=['accuracy'])
        return nn
    es = EarlyStopping(monitor='accuracy', mode='max', verbose=0, patience=20)
    nn = KerasClassifier(build_fn=nn_cl_fun, epochs=epochs, batch_size=batch_size,
                         verbose=0)
    kfold = StratifiedKFold(n_splits=5, shuffle=True, random_state=123)
    score = cross_val_score(nn, X_train, y_train, scoring=score_acc, cv=kfold, fit_params={'callbacks':[es]}).mean()
    return score

The code below sets the range of CNN hyperparameter tuning and run the Bayesian Optimization

# Set paramaters
params_nn ={
    'neurons': (10, 100),
    'activation':(0, 9),
    'optimizer':(0,7),
    'learning_rate':(0.01, 1),
    'batch_size':(200, 1000),
    'epochs':(20, 100)
}
# Run Bayesian Optimization
nn_bo = BayesianOptimization(nn_cl_bo, params_nn, random_state=111)
nn_bo.maximize(init_points=25, n_iter=4)

Output:

|   iter    |  target   | activa... | batch_... |  epochs   | learni... |  neurons  | optimizer |
-------------------------------------------------------------------------------------------------
|  1        |  0.5431   |  5.51     |  335.3    |  54.88    |  0.7716   |  36.58    |  1.044    |
|  2        |  0.5719   |  0.2023   |  536.2    |  39.09    |  0.3443   |  99.16    |  1.664    |
|  3        |  0.5432   |  0.7307   |  735.7    |  69.7     |  0.2815   |  51.96    |  0.8286   |
|  4        |  0.5431   |  0.6656   |  920.6    |  83.52    |  0.8422   |  83.37    |  6.937    |
|  5        |  0.7682   |  5.195    |  851.0    |  53.71    |  0.03717  |  50.87    |  0.7373   |
|  6        |  0.5719   |  7.355    |  758.2    |  65.22    |  0.2815   |  99.86    |  0.9663   |
|  7        |  0.6107   |  5.539    |  588.0    |  52.4     |  0.7306   |  39.05    |  2.804    |
|  8        |  0.5706   |  2.871    |  957.8    |  93.5     |  0.8157   |  13.07    |  6.604    |
|  9        |  0.5719   |  8.554    |  845.3    |  58.5     |  0.9671   |  47.53    |  2.232    |
|  10       |  0.6818   |  0.148    |  230.5    |  24.25    |  0.1367   |  13.0     |  1.585    |
|  11       |  0.6759   |  4.895    |  342.9    |  34.35    |  0.1581   |  71.47    |  3.283    |
|  12       |  0.5719   |  6.914    |  735.1    |  55.3     |  0.5993   |  51.55    |  6.743    |
|  13       |  0.5719   |  1.33     |  925.5    |  59.83    |  0.5966   |  71.62    |  1.242    |
|  14       |  0.6751   |  7.782    |  585.7    |  25.55    |  0.3711   |  42.54    |  3.304    |
|  15       |  0.5719   |  1.615    |  340.2    |  95.93    |  0.6591   |  22.15    |  6.495    |
|  16       |  0.6822   |  7.576    |  242.2    |  36.29    |  0.8738   |  70.65    |  2.081    |
|  17       |  0.5719   |  6.61     |  694.7    |  36.84    |  0.804    |  15.32    |  2.158    |
|  18       |  0.5719   |  1.866    |  977.8    |  92.75    |  0.6797   |  20.37    |  6.706    |
|  19       |  0.5144   |  0.8254   |  703.8    |  92.23    |  0.3464   |  68.75    |  6.476    |
|  20       |  0.5258   |  3.366    |  817.1    |  91.69    |  0.624    |  23.6     |  2.624    |
|  21       |  0.5815   |  5.723    |  567.3    |  62.58    |  0.3588   |  69.39    |  3.336    |
|  22       |  0.4856   |  4.091    |  299.8    |  53.0     |  0.2804   |  41.21    |  6.821    |
|  23       |  0.5719   |  1.94     |  746.3    |  22.54    |  0.837    |  73.15    |  6.762    |
|  24       |  0.7661   |  5.326    |  373.9    |  77.54    |  0.04056  |  47.68    |  1.969    |
|  25       |  0.6843   |  0.9562   |  541.1    |  87.25    |  0.1193   |  98.8     |  1.633    |
|  26       |  0.6839   |  7.588    |  234.1    |  72.25    |  0.4108   |  27.87    |  4.56     |
|  27       |  0.5719   |  3.724    |  412.4    |  62.06    |  0.9792   |  77.42    |  4.245    |
|  28       |  0.5719   |  1.738    |  997.2    |  48.97    |  0.7989   |  46.98    |  6.493    |
|  29       |  0.5719   |  8.671    |  828.0    |  51.03    |  0.8384   |  55.77    |  6.071    |

Here are the best hyperparameters.

params_nn_ = nn_bo.max['params']
activationL = ['relu', 'sigmoid', 'softplus', 'softsign', 'tanh', 'selu',
               'elu', 'exponential', LeakyReLU,'relu']
params_nn_['activation'] = activationL[round(params_nn_['activation'])]
params_nn_

Output:

{'activation': 'selu',
 'batch_size': 851.0135336291902,
 'epochs': 53.7054301919375,
 'learning_rate': 0.037173480215022196,
 'neurons': 50.872297884262295,
 'optimizer': 0.7372825972056519}

In the code above, the neural network is built in line 14 to line 24 of the #Create function starting from the function def nl_cl_fun. The neural network layers architecture is built before performing the cross-validation. This is different from conventional Machine Learning. Other Machine Learning does need to build the architecture like in def nl_cl_fun before performing the cross-validation.

Also Read : Neural Network in Machine Learning

Tune the Layers

Layers in Neural Network hyperparameters also determine the result of the prediction model. A smaller number of layers is enough for a simpler problem, but a larger number of layers is needed to build a model for a more complicated problem. The number of layers can be tuned using the “for loop” iteration. This demonstration tune the number of layers two times. Each time, the number of layers is tuned between 1 to 3.

Inserting regularization layers in a neural network can help prevent overfitting. This demonstration tries to tune whether to add regularization layers or not. There are two regularization layers to use here.

Two Regularization Layers

Batch normalization is placed after the first hidden layers. The batch normalization layer normalizes the values passed to it for every batch. This is similar to standard scaler in conventional Machine Learning.

Another regularization layer is the Dropout layer. The dropout layer, as its name suggests, randomly drops a certain number of neurons in a layer. The dropped neurons are not used anymore. The rate of how much percentage of neurons to drop is set in the dropout rate. The following is the code to tune the CNN hyperparameter tuning and layers at the same time.

neural network hyperparameters
Fig. 3 Dropout layer illustration. Source: created by myself

The following code creates a function for tuning the Neural Network hyperparameters and layers.

# Create function
def nn_cl_bo2(neurons, activation, optimizer, learning_rate, batch_size, epochs,
              layers1, layers2, normalization, dropout, dropout_rate):
    optimizerL = ['SGD', 'Adam', 'RMSprop', 'Adadelta', 'Adagrad', 'Adamax', 'Nadam', 'Ftrl','SGD']
    optimizerD= {'Adam':Adam(lr=learning_rate), 'SGD':SGD(lr=learning_rate),
                 'RMSprop':RMSprop(lr=learning_rate), 'Adadelta':Adadelta(lr=learning_rate),
                 'Adagrad':Adagrad(lr=learning_rate), 'Adamax':Adamax(lr=learning_rate),
                 'Nadam':Nadam(lr=learning_rate), 'Ftrl':Ftrl(lr=learning_rate)}
    activationL = ['relu', 'sigmoid', 'softplus', 'softsign', 'tanh', 'selu',
                   'elu', 'exponential', LeakyReLU,'relu']
    neurons = round(neurons)
    activation = activationL[round(activation)]
    optimizer = optimizerD[optimizerL[round(optimizer)]]
    batch_size = round(batch_size)
    epochs = round(epochs)
    layers1 = round(layers1)
    layers2 = round(layers2)
    def nn_cl_fun():
        nn = Sequential()
        nn.add(Dense(neurons, input_dim=10, activation=activation))
        if normalization > 0.5:
            nn.add(BatchNormalization())
        for i in range(layers1):
            nn.add(Dense(neurons, activation=activation))
        if dropout > 0.5:
            nn.add(Dropout(dropout_rate, seed=123))
        for i in range(layers2):
            nn.add(Dense(neurons, activation=activation))
        nn.add(Dense(1, activation='sigmoid'))
        nn.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
        return nn
    es = EarlyStopping(monitor='accuracy', mode='max', verbose=0, patience=20)
    nn = KerasClassifier(build_fn=nn_cl_fun, epochs=epochs, batch_size=batch_size, verbose=0)
    kfold = StratifiedKFold(n_splits=5, shuffle=True, random_state=123)
    score = cross_val_score(nn, X_train, y_train, scoring=score_acc, cv=kfold, fit_params={'callbacks':[es]}).mean()
    return score

The following code searches for the optimum hyperparameters and layers for the Neural Network model.

params_nn2 ={
    'neurons': (10, 100),
    'activation':(0, 9),
    'optimizer':(0,7),
    'learning_rate':(0.01, 1),
    'batch_size':(200, 1000),
    'epochs':(20, 100),
    'layers1':(1,3),
    'layers2':(1,3),
    'normalization':(0,1),
    'dropout':(0,1),
    'dropout_rate':(0,0.3)
}
# Run Bayesian Optimization
nn_bo = BayesianOptimization(nn_cl_bo2, params_nn2, random_state=111)
nn_bo.maximize(init_points=25, n_iter=4)

Output:

|   iter    |  target   | activa... | batch_... |  dropout  | dropou... |  epochs   |  layers1  |  layers2  | learni... |  neurons  | normal... | optimizer |
-------------------------------------------------------------------------------------------------------------------------------------------------------------
|  1        |  0.6293   |  5.51     |  335.3    |  0.4361   |  0.2308   |  43.63    |  1.298    |  1.045    |  0.426    |  31.48    |  0.3377   |  6.935    |
|  2        |  0.6502   |  2.14     |  265.0    |  0.6696   |  0.1864   |  41.94    |  1.932    |  1.237    |  0.08322  |  91.07    |  0.794    |  5.884    |
|  3        |  0.5719   |  7.337    |  992.8    |  0.5773   |  0.2441   |  53.71    |  1.055    |  1.908    |  0.1143   |  83.55    |  0.6977   |  3.957    |
|  4        |  0.5886   |  2.468    |  998.8    |  0.138    |  0.1846   |  58.8     |  1.81     |  2.456    |  0.3296   |  46.05    |  0.319    |  6.631    |
|  5        |  0.5719   |  8.268    |  851.1    |  0.03408  |  0.283    |  96.04    |  2.613    |  1.963    |  0.9671   |  47.53    |  0.3188   |  0.1151   |
|  6        |  0.768    |  0.3436   |  242.5    |  0.128    |  0.01001  |  38.11    |  2.088    |  1.357    |  0.1876   |  23.47    |  0.683    |  3.283    |
|  7        |  0.5719   |  6.914    |  735.1    |  0.4413   |  0.1786   |  56.93    |  2.927    |  1.296    |  0.9077   |  54.81    |  0.5925   |  4.793    |
|  8        |  0.767    |  1.597    |  891.7    |  0.4821   |  0.0208   |  49.18    |  1.723    |  1.944    |  0.1877   |  25.78    |  0.9491   |  4.59     |
|  9        |  0.5432   |  1.215    |  942.2    |  0.8418   |  0.01583  |  36.29    |  2.745    |  2.348    |  0.3043   |  76.1     |  0.6183   |  1.473    |
|  10       |  0.5719   |  7.219    |  247.3    |  0.3082   |  0.06221  |  97.78    |  2.819    |  2.353    |  0.124    |  96.22    |  0.09171  |  4.409    |
|  11       |  0.5892   |  8.126    |  471.8    |  0.6528   |  0.2775   |  49.92    |  2.543    |  2.792    |  0.624    |  23.6     |  0.3749   |  4.451    |
|  12       |  0.5719   |  4.132    |  625.8    |  0.3523   |  0.198    |  58.12    |  1.909    |  1.25     |  0.4183   |  34.58    |  0.3467   |  6.821    |
|  13       |  0.7683   |  1.94     |  746.3    |  0.03181  |  0.2506   |  76.13    |  2.932    |  2.184    |  0.2252   |  74.73    |  0.03087  |  2.931    |
|  14       |  0.5764   |  2.531    |  285.0    |  0.4263   |  0.2522   |  28.83    |  2.973    |  1.467    |  0.7242   |  69.48    |  0.07776  |  4.881    |
|  15       |  0.768    |  2.388    |  921.5    |  0.8183   |  0.1198   |  85.62    |  1.396    |  2.045    |  0.4184   |  93.33    |  0.8254   |  3.507    |
|  16       |  0.7684   |  1.051    |  209.3    |  0.9132   |  0.1537   |  87.45    |  1.19     |  2.607    |  0.07161  |  67.19    |  0.9688   |  2.782    |
|  17       |  0.5144   |  5.936    |  371.9    |  0.8899   |  0.296    |  79.09    |  2.283    |  1.504    |  0.4811   |  34.13    |  0.8683   |  1.868    |
|  18       |  0.5719   |  8.757    |  370.8    |  0.2978   |  0.221    |  21.03    |  1.06     |  2.468    |  0.5033   |  29.63    |  0.00893  |  5.955    |
|  19       |  0.7635   |  4.828    |  778.8    |  0.6616   |  0.2516   |  51.06    |  1.852    |  2.656    |  0.4743   |  83.8     |  0.01418  |  2.777    |
|  20       |  0.5144   |  1.155    |  294.5    |  0.206    |  0.2243   |  94.41    |  1.761    |  1.921    |  0.8746   |  83.31    |  0.02497  |  6.111    |
|  21       |  0.5442   |  5.441    |  613.2    |  0.5893   |  0.2399   |  33.86    |  1.374    |  1.516    |  0.06056  |  59.74    |  0.3518   |  6.419    |
|  22       |  0.767    |  4.289    |  283.6    |  0.1525   |  0.08206  |  82.52    |  1.786    |  2.598    |  0.4387   |  17.34    |  0.01064  |  3.016    |
|  23       |  0.7437   |  5.966    |  612.2    |  0.5801   |  0.1479   |  79.24    |  2.579    |  2.562    |  0.1363   |  94.61    |  0.8777   |  4.897    |
|  24       |  0.6826   |  8.432    |  739.0    |  0.5944   |  0.1035   |  26.69    |  2.159    |  1.035    |  0.5569   |  66.93    |  0.6784   |  1.194    |
|  25       |  0.576    |  5.194    |  364.8    |  0.2515   |  0.2908   |  91.73    |  1.246    |  2.762    |  0.9485   |  51.39    |  0.413    |  4.04     |
|  26       |  0.6123   |  0.8666   |  764.0    |  0.09547  |  0.2738   |  71.59    |  2.418    |  2.742    |  0.01     |  89.31    |  0.0      |  1.49     |
|  27       |  0.7422   |  6.366    |  780.2    |  0.6271   |  0.1646   |  53.26    |  1.954    |  2.228    |  0.6962   |  81.66    |  0.1557   |  2.563    |
|  28       |  0.5144   |  4.821    |  779.7    |  0.8649   |  0.1344   |  37.63    |  2.574    |  1.528    |  0.3698   |  79.91    |  0.7947   |  5.56     |
|  29       |  0.5719   |  0.509    |  920.4    |  0.6302   |  0.2337   |  83.36    |  2.121    |  2.895    |  0.9025   |  99.29    |  0.8399   |  6.796    |

Here are the tuned hyperparameters and layers.

params_nn_ = nn_bo.max['params']
learning_rate = params_nn_['learning_rate']
activationL = ['relu', 'sigmoid', 'softplus', 'softsign', 'tanh', 'selu',
               'elu', 'exponential', LeakyReLU,'relu']
params_nn_['activation'] = activationL[round(params_nn_['activation'])]
params_nn_['batch_size'] = round(params_nn_['batch_size'])
params_nn_['epochs'] = round(params_nn_['epochs'])
params_nn_['layers1'] = round(params_nn_['layers1'])
params_nn_['layers2'] = round(params_nn_['layers2'])
params_nn_['neurons'] = round(params_nn_['neurons'])
optimizerL = ['Adam', 'SGD', 'RMSprop', 'Adadelta', 'Adagrad', 'Adamax', 'Nadam', 'Ftrl','Adam']
optimizerD= {'Adam':Adam(lr=learning_rate), 'SGD':SGD(lr=learning_rate),
             'RMSprop':RMSprop(lr=learning_rate), 'Adadelta':Adadelta(lr=learning_rate),
             'Adagrad':Adagrad(lr=learning_rate), 'Adamax':Adamax(lr=learning_rate),
             'Nadam':Nadam(lr=learning_rate), 'Ftrl':Ftrl(lr=learning_rate)}
params_nn_['optimizer'] = optimizerD[optimizerL[round(params_nn_['optimizer'])]]
params_nn_

Output:

{'activation': 'sigmoid',
 'batch_size': 209,
 'dropout': 0.9131504384208619,
 'dropout_rate': 0.15371924329624512,
 'epochs': 87,
 'layers1': 1,
 'layers2': 3,
 'learning_rate': 0.07160587078837888,
 'neurons': 67,
 'normalization': 0.9687811501818422,
 'optimizer': <tensorflow.python.keras.optimizer_v2.adadelta.Adadelta at 0x7fa6556fad10>}

It has 67 neurons for each layer. There is a batch normalization after the first hidden layer, followed by 1 neuron hidden layer. Next, the Dropout layer drops 15% of the neurons before the values are passed to 3 more neuron hidden layers. Finally, the output layer has one neuron containing the probability value. See Figure 4 for the illustration. Now that we have the optimal hyperparameters and layers with the estimated accuracy of 0.7684, let’s fit it into the training dataset. Eventually, we get an accuracy of 0.7681 for the validation dataset. The notebook for this article is made available here.

# Fitting Neural Network
def nn_cl_fun():
nn = Sequential()
nn.add(Dense(params_nn_['neurons'], input_dim=10, activation=params_nn_['activation']))
    if params_nn_['normalization'] > 0.5:
nn.add(BatchNormalization())
    for i in range(params_nn_['layers1']):
nn.add(Dense(params_nn_['neurons'], activation=params_nn_['activation']))
    if params_nn_['dropout'] > 0.5:
nn.add(Dropout(params_nn_['dropout_rate'], seed=123))
    for i in range(params_nn_['layers2']):
nn.add(Dense(params_nn_['neurons'], activation=params_nn_['activation']))
nn.add(Dense(1, activation='sigmoid'))
nn.compile(loss='binary_crossentropy', optimizer=params_nn_['optimizer'], metrics=['accuracy'])
    return nn
es = EarlyStopping(monitor='accuracy', mode='max', verbose=0, patience=20)
nn = KerasClassifier(build_fn=nn_cl_fun, epochs=params_nn_['epochs'], batch_size=params_nn_['batch_size'],
                         verbose=0)
 nn.fit(X_train, y_train, validation_data=(X_val, y_val), verbose=1)

Output:

Epoch 1/87
369/369 [==============================] - 2s 4ms/step - loss: 0.6859 - accuracy: 0.5540 - val_loss: 0.6825 - val_accuracy: 0.5719
Epoch 2/87
369/369 [==============================] - 1s 3ms/step - loss: 0.6822 - accuracy: 0.5723 - val_loss: 0.6818 - val_accuracy: 0.5719
Epoch 3/87
369/369 [==============================] - 1s 3ms/step - loss: 0.6819 - accuracy: 0.5711 - val_loss: 0.6810 - val_accuracy: 0.5719
. . .
Epoch 87/87
369/369 [==============================] - 1s 4ms/step - loss: 0.4993 - accuracy: 0.7683 - val_loss: 0.4940 - val_accuracy: 0.7681
<tensorflow.python.keras.callbacks.History at 0x7fa67610c750>
neural network hyperparameters
Fig. 4 Illustration of the final model. Source: created by myself

Conclusion

In summary, delving into neural network hyperparameters is essential for deep learning success. By skillfully tuning parameters, especially optimizing layers, one can elevate model performance significantly. Explore the nuances of hyperparameter tuning to unlock the full potential of neural networks in the realm of deep learning.

Key Takeaways:

  • Tuning hyperparameters and layers crucial for neural network performance
  • Incorporating regularization techniques like batch normalization and dropout
  • Bayesian optimization enables efficient hyperparameter and layer tuning

Frequently Asked Questions

Q1. What is hyperparameter tuning in deep learning?

A. Hyperparameter tuning in deep learning involves optimizing model parameters like learning rate and batch size to improve performance and accuracy.

Q2. What are the hyperparameters in CNN?

A. In CNNs, hyperparameters include learning rate, batch size, number of epochs, filter size, stride, padding, and number of layers.

Q3. What is hyperparameter tuning and cross validation?

A. Hyperparameter tuning involves selecting optimal parameters, while cross-validation evaluates model performance by splitting data into training and validation sets.

Q4. Which dataset is hyperparameter tuning?

A. Hyperparameter tuning can be applied to any dataset, such as CIFAR-10, ImageNet, or custom datasets specific to the problem domain.

The media shown in this article are not owned by Analytics Vidhya and is used at the Author’s discretion.

Rendyk 13 Jun, 2024

A Data Science professional with seasoned specializations in Machine Learning development and Geo-spatial analysis. Hold the TensorFlow Developer Certificate. Have strong work experience in: - delivering meaningful data-driven insights to support business goals, - automating data processing, - data analysis (tabular, time series, text/NLP, and image), - descriptive and inferential statistical analysis, - GIS or spatial data analysis, - data visualization and dashboard development, - Machine Learning modeling (regression, classification, clustering, dimensionality reduction, time series forecasting, recommender engine) - Deep Learning or Artificial Intelligence (regression and classification with MLP, image classification with CNN, time series forecasting with LSTM, text classification with LSTM) - Hugging face: transformers, fine-tuning - Large Language Models (LLM) - Stable Diffusion - web application development, - developing APIs, etc.

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers

Clear

Flo
Flo 18 Mar, 2022

Thx man, great code. Working like a charm

Amobichukwu Amanambu
Amobichukwu Amanambu 20 May, 2022

I have tried using you model for optimization doing the neural net regression. i swapped out the classifier with the regressor and few other changes, but the target gives me a zero value all through. Do you think you can help me with suggestion on what i can do? I put it up as a question in stackoverflow. I have input a link. I cite you there as the creator of the code. Any answer can help me. Thank you

Hank
Hank 17 Jun, 2022

Thanks for the nice tutorial. Can I ask what the early stoping is monitoring during k-fold cross-validation. Does it monitor the training accuracy or test accuracy? For example, in the last 5 lines in “nn_cl_bo2” function, what is the criteria to stop training earlier? Thanks

Gilbi
Gilbi 21 Dec, 2022

Hi, Thanks for the article. In the first part of the article (before to tune the number of layer), it seems not clear to me why in nn_cl_bo function, optimizer is set to "opt" (which means Adam) whereas I understood that the choice was to consider the optimizer as an hyperparameter with 7 possible optimizers. Is it a typo? Thanks.

skan
skan 11 Sep, 2023

Hello. Should we tune all hyperparameters simultaneously, like on a multivariate optimization problem? Or one after the other, in a greedy way? I guess the first option will produce better results but less robust. Should we tune the number of epochs after all other hyperparameters? Should we use different validation data for each hyperparameter?

Mandula Thrimanne
Mandula Thrimanne 15 Mar, 2024

Great article! Keep up the good work!