How to Apply K-Fold Averaging on Deep Learning Classifier

Sajal Last Updated : 24 Feb, 2025

6 min read

In this article, we will be learning about how to apply k-fold cross-validation to a deep learning image classification model. Like my other articles, this article is going to have hands-on experience with code. This article will initially start with the theory part then we will move to code and its explanation. The coding part will be completely dependent on PYTHON. This method discussed here can be applied to some Machine Learning problems also but we are taking the example of the Deep Learning problem only. If you want an article on how to apply it on ML please let me know in the comment section.

This article focuses on all categories of viewers (beginner, Intermediate, Expert). So in case, you know the k-fold algorithm you can skip the initial topics and jump to the how-to code section. But I will advise you to read the complete article.

This article was published as a part of the Data Science Blogathon

Why do we need K-Fold?
What is K-Fold?
How to Code K-Fold with Deep Learning?
Conclusion

Let’s first try to understand what are we doing here.

While creating a deep-learning model we want our model to be robust and accurate. To do so we need some sort of evaluation technique, i.e. some way to check if our model is working fine or not. K-fold is one of the techniques which helps us evaluate our model. You might have seen the use of K-fold various times but here in this article we will not just use this technique to evaluate but we will also use it to calculate results from the model. Before using this technique we should first understand why it is soo important and why is it better than other validation techniques.

Why do we need K-Fold?

Generally, we use the Holdout Method where we split the data into two parts: Training and testing and check if the model trained on training data performs well on testing or not using some error metrics. But can we rely on this method?

NO, the evaluation done this way depends heavily on the training set’s data points that also appear in the test set, making it highly reliant on the data-splitting method. Let’s see how K-Fold is better than the conventional holdout method.

What is K-Fold?

K-Fold is validation technique in which we split the data into k-subsets and the holdout method is repeated k-times where each of the k subsets are used as test set and other k-1 subsets are used for the training purpose. Then the average error from all these k trials is computed , which is more reliable as compared to standard handout method.

So with this technique, we don’t have to care about how data is actually divided.

Let’s see a pictorial representation below which will give you better insights into how and what are we doing.

The above Image represents how we split data in the holdout method.

In the above image, we have divided the dataset into 5 subsets i.e k = 5. Each time we are picking up 1 fold or 1 subset and using it as a testing subset. Each iteration represented above is nothing but a holdout method with different training and testing data.

Now in this article, we will be using the same technique but not to average error but for the calculation of averaged predictions for our deep learning model.

How to Code K-Fold with Deep Learning?

For using K-Fold with Deep Learning Model we require some prerequisites to be fulfilled which are given below.

Dataset: https://www.kaggle.com/oossiiris/hackerearth-deep-learning-challenge-holidayseason

For this problem, we will be using HackerEarth’s Dataset which contains 6 classes that need to be classified by our DL model. The main folder contains 2 subfolders which are train, test, and a CSV file. Subfolders have the images and CSV contains all information about training and testing data. Additional information can be found on the download page. You can have a look there.

For testing, we will be using training data only. i.e we will be only evaluating training accuracy.

Libraries: pandas, Keras, sklearn.

We will not be coding K-Fold from scratch because its implementation is already provided by sklearn. We will be using that only.

sklearn.model_selection.KFold(n_splits=5, *, shuffle=False, random_state=None)

Let’s look at parameters in depth. So n_splits means the number of subsets of the dataset you want to create. shuffle: Whether to shuffle the data before splitting it into batches.

Now let’s look at the modification of the K-fold defined in sklearn which we will be using.

sklearn.model_selection.StratifiedKFold(n_splits=5, *, shuffle=False, random_state=None)

Stratified K-fold: The main difference between stratified and normal k-fold is the way of splitting i.e. stratified K-Fold guarantees that each split is going to have some percentage of each class which tries to minimize the effect of each class on the result.

It is better to use this technique when you have an imbalanced dataset.

Let’s Create a Custom Model for comparing

def get_model(IMG_SIZE):
    base_model =applications.ResNet50(weights='imagenet', include_top=False, input_shape=(IMG_SIZE, IMG_SIZE, 3))
    add_model = Sequential()
    add_model.add(Flatten(input_shape=base_model.output_shape[1:]))
    add_model.add(Dropout(0.3))
    add_model.add(Dense(64, activation='relu'))
    add_model.add(Dropout(0.4))
    add_model.add(Dense(6, activation='softmax'))
    model = Model(inputs=base_model.input, outputs=add_model(base_model.output))
    model.compile(loss='categorical_crossentropy', optimizer=optimizers.SGD(lr=1e-4, momentum=0.9),
                  metrics=['accuracy'])
    return model

We used a modified ResNet-50 in the above model, which Keras has already implemented. We created this model for sample explanation purposes and did not optimize it.

#model configuration

IMG_SIZE = 128
BATCH_SIZE = 16
EPOCHS = 1
N_SPLIT = 7

# Storing the average of all predictions
main_pred = []
data_kfold = pd.DataFrame()

# Creating X, Y for training

train_y = df.Class

train_x = df.drop(['Class'],axis=1)

In the above code, we have initialized n_splits to 7 so that while taking average we can have at least one class that repeats. We have 6 classes so if we take a split of less than 7 we may come across a case where every time we have a different prediction.

So in the end we will not be able to find which class is predicted in the majority and our model will be giving random results only.

Main Code

#Initializing Data Generators

train_datagen = ImageDataGenerator(rescale = 1./255,

                                   shear_range = 0.2,

                                   zoom_range = 0.2,

                                   horizontal_flip = True)

validation_datagen = ImageDataGenerator(rescale = 1./255)

# k-fold

kfold = StratifiedKFold(n_splits=N_SPLIT,shuffle=True,random_state=42)

# Variable for keeping count of split we are executing

j = 0

# K-fold Train and test for each split

for train_idx, val_idx in list(kfold.split(train_x,train_y)):

    x_train_df = df.iloc[train_idx]

    x_valid_df = df.iloc[val_idx]

    j+=1

    training_set = train_datagen.flow_from_dataframe(dataframe=x_train_df, directory=TRAIN_PATH,

                                                 x_col="Image", y_col="Class",

                                                 class_mode="categorical",

                                                 target_size=(IMG_SIZE,IMG_SIZE), batch_size=BATCH_SIZE)

    validation_set = validation_datagen.flow_from_dataframe(dataframe=x_valid_df, directory=TRAIN_PATH,

                                                 x_col="Image", y_col="Class",

                                                 class_mode="categorical",

                                                 target_size=(IMG_SIZE,IMG_SIZE), batch_size=BATCH_SIZE)

    model_test = get_model(IMG_SIZE)

    history = model_test.fit_generator( training_set,

                                        validation_data=validation_set,

                                        epochs = EPOCHS,

                                        steps_per_epoch=x_train_df.shape[0] // BATCH_SIZE

                                        )

    test_generator = ImageDataGenerator(rescale = 1./255)

    test_set = test_generator.flow_from_dataframe(dataframe=train, directory=TRAIN_PATH,

                                                 x_col="Image",y_col=None,

                                                 class_mode=None,

                                                 target_size=(IMG_SIZE,IMG_SIZE))

    pred= model_test.predict_generator(test_set, len(train) // BATCH_SIZE)

    predicted_class_indices=np.argmax(pred,axis=1)

    data_kfold[j] = predicted_class_indices

    gc.collect()

Let's Do the understanding part now.

In the above code, we are just repeating the HoldOut Method 7 times and storing the output in data_kfold. So basically data_kfold will look like the below image where we will be having

This is how can you see something strange when we take different training data we are getting different results like in the above example for the image at 6468 we can see the same model once predicted it to be 2 and 1. but most of the time predicted value is 4. So what do you think should be the answer ?. Since we can see more 4 than any other class so we are going to go with 4 only. Yes !. That’s exactly what we are doing in the below code.

# Taking The Label with Maximum Occurences

labels=(training_set.class_indices)
labels2=dict((v,k) for k,v in labels.items())
import collections 
for i in range(len(data_kfold)):
    co = collections.Counter(data_kfold.loc[i])
    co = sorted(co.items(),key=lambda x: x[1],reverse=True)
    ans.Class.loc[i] = labels2[co[0][0]]

In the above code, we are storing max occurrence for the most predicted labels to a data holder “ans”. labels2 is nothing but a mapping from integer to string because our original data contains target class in form of strings so we need it.

Woah so we are complete with the tutorial. But what about results lets compare the results of Averaged and Standard Holdout Method’s training Accuracy.

Accuracy of HandOut Method: 0.32168805070335443

Accuracy of K-Fold Method: 0.4274230947596228

These are the results which we have gained. When we took the average of K-Fold and when we apply Holdout. So we can clearly say that K-fold is performing pretty much better.

CODE

You can get code from Kaggle as well as Github

Kaggle: https://www.kaggle.com/oossiiris/k-fold-accuracy-comparison-blog

Github: https://github.com/r-sajal/DeepLearning-/blob/master/ComputerVision/k-fold-accuracy-comparison-blog.ipynb

Conclusion

So with the above results, we can clearly see that accuracy of K-Fold Averaging is pretty much better than that by the Standard Holdout method even for one epoch. You can train a model for more epochs and can observe the changes. Not only the accuracy is high but we can also say that the final model is going to be more robust than the normal one.

Of course, you could have figured out that this will take much much more time when we have more data and more classes. So this method can only be applied when you have limited data and also you have limited classes.

Happy Learning.

About Me

I am a Final Year student from IIIT pursuing my bachelor’s in Computer Science and Engineering. I love to explore and try more and more about Deep and Machine Learning. If you want to collaborate you can use the below information

Linkedin: Click Me

Github: Click Me

Mail me: [email protected]

Thank You for reading!

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.

Sajal

I am an enthusiastic experimenting programmer trying to learn and implement ways of python. I write what I do. I believe in implementing first and writing next. To make my viewers comfortable with my articles I try to keep them simple and reply to their queries really fast (You can try this). To know more about me you can surely ping me up and we might end up experimenting something cool with python.

Free Courses

4.7

Understanding the working of Neural Networks

Learn the neural network basics, concepts, layers, and activation functions.

4.6

Introduction to Natural Language Processing

Learn NLP basics, text preprocessing, and regular expressions.

4.8

Deep Dive Into QwQ-32B

Explore QwQ-32B's architecture, implementation and real-world applications.

4.8

Building Your First Computer Vision Model

Build your first computer vision model with Pytorch.

Introduction to PyTorch for Deep Learning

Master PyTorch and Build deep learning models from scratch.

Reading list

How to Apply K-Fold Averaging on Deep Learning Classifier

Table of contents

Why do we need K-Fold?

What is K-Fold?

How to Code K-Fold with Deep Learning?

Let’s Create a Custom Model for comparing

Main Code

CODE

Conclusion

About Me

Login to continue reading and enjoy expert-curated content.

Free Courses

Understanding the working of Neural Networks

Introduction to Natural Language Processing

Deep Dive Into QwQ-32B

Building Your First Computer Vision Model

Introduction to PyTorch for Deep Learning

Recommended Articles

Responses From Readers

Become an Author

Flagship Programs

Free Courses

Popular Categories

Generative AI Tools and Techniques

Popular GenAI Models

AI Development Frameworks

Data Science Tools and Techniques

Reading list

Basics of Machine Learning

Machine Learning Lifecycle

Importance of Stats and EDA

Understanding Data

Probability

Exploring Continuous Variable

Exploring Categorical Variables

Missing Values and Outliers

Central Limit theorem

Bivariate Analysis Introduction

Continuous - Continuous Variables

Continuous Categorical

Categorical Categorical

Multivariate Analysis

Different tasks in Machine Learning

Build Your First Predictive Model

Evaluation Metrics

Preprocessing Data

Linear Models

KNN

Selecting the Right Model

Feature Selection Techniques

Decision Tree

Feature Engineering

Naive Bayes

Multiclass and Multilabel

Basics of Ensemble Techniques

Advance Ensemble Techniques

Hyperparameter Tuning

Support Vector Machine

Advance Dimensionality Reduction

Unsupervised Machine Learning Methods

Recommendation Engines

Improving ML models

Working with Large Datasets

Interpretability of Machine Learning Models

Automated Machine Learning

Model Deployment

Deploying ML Models

Embedded Devices

How to Apply K-Fold Averaging on Deep Learning Classifier

Table of contents

Why do we need K-Fold?

What is K-Fold?

How to Code K-Fold with Deep Learning?

Let’s Create a Custom Model for comparing

Main Code

CODE

Conclusion

About Me

Login to continue reading and enjoy expert-curated content.

Free Courses

Understanding the working of Neural Networks

Introduction to Natural Language Processing

Deep Dive Into QwQ-32B

Building Your First Computer Vision Model

Introduction to PyTorch for Deep Learning

Recommended Articles

Responses From Readers

Become an Author

Flagship Programs

Free Courses

Popular Categories

Generative AI Tools and Techniques

Popular GenAI Models

AI Development Frameworks

Data Science Tools and Techniques