Sajal Rastogi — September 16, 2021
Beginner Deep Learning Programming Project Python Technique

This article was published as a part of the Data Science Blogathon

In this article, we will be learning about how to apply k-fold cross-validation to a deep learning image classification model. Like my other articles, this article is going to have hands-on experience with code. This article will initially start with the theory part then we will move to code and its explanation. The coding part will be completely dependent on PYTHON. This method discussed here can be applied to some Machine Learning problems also but we are taking the example of the Deep Learning problem only. If you want an article on how to apply it on ML please let me know in the comment section.

This article focuses on all categories of viewers (beginner, Intermediate, Expert). So in case, you know the k-fold algorithm you can skip the initial topics and jump to the how-to code section. But I will advise you to read the complete article.

Table of Content 

1. Introduction
2. Why do we need K-Fold?
3. What is K-Fold?
4. How to Code K-Fold?
5. Conclusion


Let’s first try to understand what are we doing here.

While creating a deep-learning model we want our model to be robust and accurate. To do so we need some sort of evaluation technique, i.e. some way to check if our model is working fine or not. K-fold is one of the techniques which helps us evaluate our model. You might have seen the use of K-fold various times but here in this article we will not just use this technique to evaluate but we will also use it to calculate results from the model. Before using this technique we should first understand why it is soo important and why is it better than other validation techniques.

Why do we need K-Fold?

Generally, we use the Holdout Method where we split the data into two parts: Training and testing and check if the model trained on training data performs well on testing or not using some error metrics. But can we rely on this method?

The answer is NO. The evaluation done in this way highly depends on the data points of the training set which are in the test set also and thus the evaluation is highly dependent on the method of splitting the data. Let’s see how K-Fold is better than the conventional holdout method.

What is K-Fold?

K-Fold is validation technique in which we split the data into k-subsets and the holdout method is repeated k-times where each of the k subsets are used as test set and other k-1 subsets are used for the training purpose. Then the average error from all these k trials is computed , which is more reliable as compared to standard handout method.

So with this technique, we don’t have to care about how data is actually divided.

Let’s see a pictorial representation below which will give you better insights into how and what are we doing.

holdout method | K-fold

The above Image represents how we split data in the holdout method.

split data | K-fold
The above image is taken from google images

In the above image, we have divided the dataset into 5 subsets i.e k = 5. Each time we are picking up 1 fold or 1 subset and using it as a testing subset. Each iteration represented above is nothing but a holdout method with different training and testing data.

Now in this article, we will be using the same technique but not to average error but for the calculation of averaged predictions for our deep learning model.

How to Code K-Fold with Deep Learning?

For using K-Fold with Deep Learning Model we require some prerequisites to be fulfilled which are given below.


For this problem, we will be using HackerEarth’s Dataset which contains 6 classes that need to be classified by our DL model. The main folder contains 2 subfolders which are train, test, and a CSV file. Subfolders have the images and CSV contains all information about training and testing data. Additional information can be found on the download page. You can have a look there.

For testing, we will be using training data only. i.e we will be only evaluating training accuracy.

Libraries: pandas, Keras, sklearn.

We will not be coding K-Fold from scratch because its implementation is already provided by sklearn. We will be using that only.

sklearn.model_selection.KFold(n_splits=5, *, shuffle=False, random_state=None)

Let’s look at parameters in depth. So n_splits means the number of subsets of the dataset you want to create. shuffle: Whether to shuffle the data before splitting it into batches.

Now let’s look at the modification of the K-fold defined in sklearn which we will be using.

sklearn.model_selection.StratifiedKFold(n_splits=5, *, shuffle=False, random_state=None)

Stratified K-fold: The main difference between stratified and normal k-fold is the way of splitting i.e. stratified K-Fold guarantees that each split is going to have some percentage of each class which tries to minimize the effect of each class on the result.

It is better to use this technique when you have an imbalanced dataset.

Let’s Create a Custom Model for comparing

def get_model(IMG_SIZE):
    base_model =applications.ResNet50(weights='imagenet', include_top=False, input_shape=(IMG_SIZE, IMG_SIZE, 3))
    add_model = Sequential()
    add_model.add(Dense(64, activation='relu'))
    add_model.add(Dense(6, activation='softmax'))
    model = Model(inputs=base_model.input, outputs=add_model(base_model.output))
    model.compile(loss='categorical_crossentropy', optimizer=optimizers.SGD(lr=1e-4, momentum=0.9),
    return model

In the above model, we have used modified Resnet-50 whose implementation is already defined in Keras. The above model is not optimized it is only for sample explanation purposes.

#model configuration

IMG_SIZE = 128

# Storing the average of all predictions
main_pred = []
data_kfold = pd.DataFrame()

# Creating X, Y for training

train_y = df.Class

train_x = df.drop(['Class'],axis=1)


In the above code, we have initialized n_splits to 7 so that while taking average we can have at least one class that repeats. We have 6 classes so if we take a split of less than 7 we may come across a case where every time we have a different prediction.

So in the end we will not be able to find which class is predicted in the majority and our model will be giving random results only.

Main Code

#Initializing Data Generators

train_datagen = ImageDataGenerator(rescale = 1./255,

                                   shear_range = 0.2,

                                   zoom_range = 0.2,

                                   horizontal_flip = True)

validation_datagen = ImageDataGenerator(rescale = 1./255)

# k-fold

kfold = StratifiedKFold(n_splits=N_SPLIT,shuffle=True,random_state=42)

# Variable for keeping count of split we are executing

j = 0

# K-fold Train and test for each split

for train_idx, val_idx in list(kfold.split(train_x,train_y)):

    x_train_df = df.iloc[train_idx]

    x_valid_df = df.iloc[val_idx]


    training_set = train_datagen.flow_from_dataframe(dataframe=x_train_df, directory=TRAIN_PATH,

                                                 x_col="Image", y_col="Class",


                                                 target_size=(IMG_SIZE,IMG_SIZE), batch_size=BATCH_SIZE)

    validation_set = validation_datagen.flow_from_dataframe(dataframe=x_valid_df, directory=TRAIN_PATH,

                                                 x_col="Image", y_col="Class",


                                                 target_size=(IMG_SIZE,IMG_SIZE), batch_size=BATCH_SIZE)

    model_test = get_model(IMG_SIZE)

    history = model_test.fit_generator( training_set,


                                        epochs = EPOCHS,

                                        steps_per_epoch=x_train_df.shape[0] // BATCH_SIZE


    test_generator = ImageDataGenerator(rescale = 1./255)

    test_set = test_generator.flow_from_dataframe(dataframe=train, directory=TRAIN_PATH,




    pred= model_test.predict_generator(test_set, len(train) // BATCH_SIZE)


    data_kfold[j] = predicted_class_indices


Let's Do the understanding part now.

So in the above code, we are just repeating the HoldOut Method 7 times and storing the output in data_kfold. So basically data_kfold will look like the below image where we will be having


So how can you see something strange when we take different training data we are getting different results like in the above example for the image at 6468 we can see the same model once predicted it to be 2 and 1. but most of the time predicted value is 4. So what do you think should be the answer ?. Since we can see more 4 than any other class so we are going to go with 4 only. Yes !. That’s exactly what we are doing in the below code.

# Taking The Label with Maximum Occurences

labels2=dict((v,k) for k,v in labels.items())
import collections 
for i in range(len(data_kfold)):
    co = collections.Counter(data_kfold.loc[i])
    co = sorted(co.items(),key=lambda x: x[1],reverse=True)
    ans.Class.loc[i] = labels2[co[0][0]]

In the above code, we are storing max occurrence for the most predicted labels to a data holder “ans”. labels2 is nothing but a mapping from integer to string because our original data contains target class in form of strings so we need it.

Woah so we are complete with the tutorial. But what about results lets compare the results of Averaged and Standard Holdout Method’s training Accuracy.

Accuracy of HandOut Method: 0.32168805070335443

Accuracy of K-Fold Method: 0.4274230947596228


These are the results which we have gained. When we took the average of K-Fold and when we apply Holdout. So we can clearly say that K-fold is performing pretty much better.


You can get code from Kaggle as well as Github




So with the above results, we can clearly see that accuracy of K-Fold Averaging is pretty much better than that by the Standard Holdout method even for one epoch. You can train a model for more epochs and can observe the changes. Not only the accuracy is high but we can also say that the final model is going to be more robust than the normal one.

Of course, you could have figured out that this will take much much more time when we have more data and more classes. So this method can only be applied when you have limited data and also you have limited classes.

That Marks the end of this article.

Happy Learning.


I am a Final Year student from IIIT pursuing my bachelor’s in Computer Science and Engineering. I love to explore and try more and more about Deep and Machine Learning. If you want to collaborate you can use the below information

Linkedin: Click Me

Github: Click Me

Mail me: [email protected]

Thank You for reading!

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.

About the Author

Our Top Authors

  • Analytics Vidhya
  • Guest Blog
  • Tavish Srivastava
  • Aishwarya Singh
  • Ram Dewani
  • Faizan Shaikh
  • Aniruddha Bhandari

Download Analytics Vidhya App for the Latest blog/Article

Leave a Reply Your email address will not be published. Required fields are marked *