Malaria Cell Image Classification – An End-to-End Prediction

Abdiel Goni 20 Dec, 2021 • 8 min read

This article was published as a part of the Data Science Blogathon

Overview

This article will discuss building a system that can detect malaria from cell images. The plan will be created in the form of a web application that can make it easier for users and even make it easier for developers who make webs.

Algorithm for Classification Malaria Cell Image Using Deep Learning

The first thing to do after getting the dataset is to perform an image data generator, which rescales the image and then sets the target size (150, 150) and then divides the data into training and test data. For the classification of malaria cell images, preprocessing does not take a long time. After preprocessing, we create a deep learning architecture using a Convolutional Neural Network. In this section, we have to define the number of convolutions that we will operate along with the activation function, kernel size, filters, and the number of dense layers of how many neurons to set. Then to compile the model, we need loss, optimizer, and metrics. Next, to fit the model, we must set the number of epochs, validation step, step per epoch, and verbose.

Load Data

Because the data processing uses Google Colab and the dataset source is from Kaggle, the dataset taken from Kaggle is saved to google drive using the code below:

! pip install -q kaggle
from google.colab import files
files.upload()

! mkdir ~/.kaggle
! cp kaggle.json ~/.kaggle/

import os
os.chdir('drive/MyDrive/.../')

!kaggle datasets download -d iarunava/cell-images-for-detecting-malaria #download dataset from kaggle

After downloading the dataset, we have to extract the file because the downloaded file is in .zip format.

import os
# Complete path to storage location of the .zip file of data
zip_path = '/content/drive/MyDrive/Kaggle/cell-images-for-detecting-malaria.zip'
# Check current directory (be sure you're in the directory where Colab operates: '/content')
os.getcwd()
# Copy the .zip file into the present directory
!cp '{zip_path}' .
# Unzip quietly 
!unzip -q 'cell-images-for-detecting-malaria.zip'
# View the unzipped contents in the virtual machine
os.listdir()

Preprocessing

As explained in the algorithm subsection above, we first do the preprocessing as follows.

from tensorflow.keras.preprocessing.image import ImageDataGenerator

#image data generataor
datagen = ImageDataGenerator(
                    rescale=1./255,
                    validation_split = 0.2)

train_generator = datagen.flow_from_directory(
    path_dir,
    target_size=(150,150),
    shuffle=True,
    subset='training'
)

validation_generator = datagen.flow_from_directory(
    path_dir,
    target_size=(150,150),
    subset='validation'
)

Build Model

Define a model which takes (None, 150, 150, 3) input. Stack 4 convolutional layers with kernel size (3, 3) with a growing number of filters (32, 64, 128, 128). Add 2×2 pooling layer after every 2 convolutional layers (conv-conv-pool scheme). Add a dense layer with 512 neurons and a second dense layer with two neurons for classes.

import tensorflow as tf

#Activation function: relu and sigmoid

model = tf.keras.models.Sequential([
    #first_convolution
    tf.keras.layers.Conv2D(32, (3,3), activation='relu', input_shape=(150, 150, 3)),
    tf.keras.layers.MaxPooling2D(2, 2),
    #second_convolution
    tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2,2),
    #third_convolution
    tf.keras.layers.Conv2D(128, (3,3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2,2),
    #fourth_convolution
    tf.keras.layers.Conv2D(128, (3,3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2,2),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dropout(0.5),
    # 512 neuron hidden layer
    tf.keras.layers.Dense(512, activation='relu'),
    tf.keras.layers.Dense(2, activation='sigmoid') 
])

Compile model, with adam optimizer and binary cross-entropy for loss function:

model.compile(loss='binary_crossentropy',
              optimizer=tf.optimizers.Adam(),
              metrics=['accuracy'])

Fit Model , with 25 steps_per_epoch, 20 epochs, 5 validation_steps, and 2 verbose:

model.fit(
      train_generator,
      steps_per_epoch=25,
      epochs=20,
      validation_data=validation_generator,
      validation_steps=5,
      verbose=2)

Epoch 1/20
25/25 - 49s - loss: 0.6959 - accuracy: 0.5175 - val_loss: 0.7033 - val_accuracy: 0.4313
Epoch 2/20
25/25 - 39s - loss: 0.6328 - accuracy: 0.6562 - val_loss: 0.7640 - val_accuracy: 0.7750
Epoch 3/20
25/25 - 39s - loss: 0.5175 - accuracy: 0.7875 - val_loss: 0.3806 - val_accuracy: 0.8750
Epoch 4/20
25/25 - 39s - loss: 0.3837 - accuracy: 0.8850 - val_loss: 0.3086 - val_accuracy: 0.9312
Epoch 5/20
25/25 - 39s - loss: 0.2298 - accuracy: 0.9225 - val_loss: 0.1345 - val_accuracy: 0.9500
Epoch 6/20
25/25 - 39s - loss: 0.2213 - accuracy: 0.9262 - val_loss: 0.2110 - val_accuracy: 0.9438
Epoch 7/20
25/25 - 39s - loss: 0.1808 - accuracy: 0.9350 - val_loss: 0.1000 - val_accuracy: 0.9563
Epoch 8/20
25/25 - 39s - loss: 0.1673 - accuracy: 0.9337 - val_loss: 0.3180 - val_accuracy: 0.9062
Epoch 9/20
25/25 - 38s - loss: 0.1937 - accuracy: 0.9375 - val_loss: 0.1052 - val_accuracy: 0.9750
Epoch 10/20
25/25 - 39s - loss: 0.1674 - accuracy: 0.9438 - val_loss: 0.1593 - val_accuracy: 0.9250
Epoch 11/20
25/25 - 39s - loss: 0.2266 - accuracy: 0.9450 - val_loss: 0.1387 - val_accuracy: 0.9375
Epoch 12/20
25/25 - 39s - loss: 0.1866 - accuracy: 0.9425 - val_loss: 0.2648 - val_accuracy: 0.9125
Epoch 13/20
25/25 - 38s - loss: 0.1623 - accuracy: 0.9525 - val_loss: 0.2226 - val_accuracy: 0.9187
Epoch 14/20
25/25 - 38s - loss: 0.1764 - accuracy: 0.9500 - val_loss: 0.1876 - val_accuracy: 0.9375
Epoch 15/20
25/25 - 39s - loss: 0.1775 - accuracy: 0.9500 - val_loss: 0.1868 - val_accuracy: 0.9250
Epoch 16/20
25/25 - 41s - loss: 0.1780 - accuracy: 0.9438 - val_loss: 0.1737 - val_accuracy: 0.9312
Epoch 17/20
25/25 - 38s - loss: 0.1750 - accuracy: 0.9525 - val_loss: 0.1473 - val_accuracy: 0.9438
Epoch 18/20
25/25 - 38s - loss: 0.1859 - accuracy: 0.9438 - val_loss: 0.1117 - val_accuracy: 0.9625
Epoch 19/20
25/25 - 38s - loss: 0.1270 - accuracy: 0.9550 - val_loss: 0.1153 - val_accuracy: 0.9438
Epoch 20/20
25/25 - 38s - loss: 0.1630 - accuracy: 0.9525 - val_loss: 0.2125 - val_accuracy: 0.9125

Based on the fit model results, we can see the accuracy of the training and validate its accuracy. The training accuracy is 0.9525 or 95.25%, and the validation accuracy is 91.25%.
Save Model and Predict:

model.save("malaria_cell.h5") #the model is saved with the name malaria_cell.h5

Confusion Matrix to see test results

In the last part, we will look at testing the data using the confusion matrix. This will show how much test data we have and how much data was misclassified by the system.

import tensorflow as tf
from sklearn.metrics import confusion_matrix
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt

pred = model.predict(validation_generator)
y_pred = np.argmax(pred, axis=1)
y_true = np.argmax(pred, axis=1)
print('confusion matrix')
print(confusion_matrix(y_true, y_pred))
    #confusion matrix
f, ax = plt.subplots(figsize=(8,5))
sns.heatmap(confusion_matrix(y_true, y_pred), annot=True, fmt=".0f", ax=ax)
plt.xlabel("y_pred")
plt.ylabel("y_true")
plt.show()

Output:

confusion matrix
[[2798    0]
 [   0 2712]]

Or it can be visualized like the following image.

Malaria Cell Image Classification | Graph

Note: 0 label is Parasitized and 1 label is Uninfected.

This model achieves 95.25% accuracy during training and 91.25% on validation or testing.

GUI for Malaria Cell Image Classification

The system that was made previously only came to conducting training and testing on the Convolutional Neural Network architectural model that was created. The accuracy of the training and testing produced is also good. This can be seen from the Confusion Matrix displayed at the end of the test. But the system created still has shortcomings. Yes, Graphical User Interface. The system that has been made is quite suitable where this system has successfully carried out the test with an accuracy of 0.9125or 91.25%. However, this system can only be run by a Machine Learning Engineer who implements the calculations to perform classification/prediction into a program. Of course, the system created is not only for use by Machine Learning Engineers. Remember, the goal is to help health workers make it easier to diagnose. This means that this system must be run or operated by the user (in this case, the user in question is a health worker) to detect or classify which cell images are infected with malaria and which are not infected. For this reason, we need to create a GUI so that it is easier for health workers or users to operate the system created.

Streamlit

Streamlit turns data scripts into shareable web apps in minutes. All in Python. All for free. No front-end experience is required. That’s Streamlit. We will create web apps using Streamlit to make it easier for users to operate the malaria cell image classification system we created using deep learning techniques. With Streamlit, it’s not only users who find it easier to operate the system; programmers or developers also find it easier to create web apps. With streamlit, developers also don’t need experience developing a web or working as a front-end developer. Streamlit is trusted by more than 50% of Fortune 50 companies, such as IBM, UBER, TESLA, WALMART, Intel, Apple, and many more. Streamlit is also used in the world’s top data science groups. Streamlit Compatible with libraries in python such as matplotlib, bokeh, plotly, scikit-learn, pandas, numpy, hard, tensorflow, opencv, and even more, with Streamlit Components

Get Started in Under a Minute

Streamlit’s open-source app framework is a breeze to get started with. It’s just a matter of:

pip install streamlit

Building Web Applications

Import library. Required libraries:

Streamlit. Streamlit is the main library needed to create web apps.

Tensorflow. Why Tensorflow? First, tensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML powered applications. Second, The deep learning model we created using the hard library from tensorflow. We have also saved the built model, and this model will be loaded in this web application.

Numpy. Numpy is very necessary to process the arrangement of images that we load.

PIL. PIL or Pillow is Python Imaging Library. This library is very useful for loading images from operating system and performing operations on loaded images (image processing).

import streamlit as st #we can call this library as st in the code that we will create.
import tensorflow as tf #we can call this library as tf in the code.
import numpy as np #we can call this library as np in the code.
from PIL import Image, ImageOps #we can call this library for load images and performing operations on load images.

Creat webpage header:

st.write("""
# Malaria Cell Classification
"""
)

Create a sidebar for uploading files with the name or title on the sidebar ‘Upload Cell Image’ and type png:

upload_file = st.sidebar.file_uploader("Upload Cell Images", type="png")

Create a button on the sidebar to make predictions:

Generate_pred=st.sidebar.button("Predict")

Loads the created CNN model:

model=tf.keras.models.load_model('malaria_cell.h5') #The model created, saved under the name malaria_cell.h5.

Create functions for import and prediction/classification of images:

def import_n_pred(image_data, model):
    size = (150,150)
    image = ImageOps.fit(image_data, size, Image.ANTIALIAS)
    img = np.asarray(image)
    reshape=img[np.newaxis,...]
    pred = model.predict(reshape)
    return pred

In the function created above, we set the image size (150, 150) then perform operations on the image by calling the ImageOps mode. In operations using ImageOps, we also add ANTIALIAS. When ANTIALIAS was initially added, it was the only high-quality filter based on convolutions. Next, create an image into a numpy array and reshape the array, and perform predictions or classifications by calling the model using the model.predict syntax from tensorflow. The return is the pred variable, which is the prediction result.

Displays the results when the user performs the action pressing the predict button in the sidebar:

if Generate_pred:
    image=Image.open(upload_file)
    with st.beta_expander('Cell Image', expanded = True):
        st.image(image, use_column_width=True)
    pred=import_n_pred(image, model)
    labels = ['Parasitized', 'Uninfected']
    st.title("Prediction of image is {}".format(labels[np.argmax(pred)]))

The simple explanation, if the user presses the predict button, then he also act an action on the Generate pred variable and will perform actions such as, the system will open the image file to be uploaded, then it will call the import function and the prediction is made and perform the prediction then it will show the label ‘Parasitized’ ‘ or ‘Uninfected’ of the image.

Full Code of GUI

This is the entire coding of the codes above:

"""
@author: Abdiel W. Goni
"""
import streamlit as st
import tensorflow as tf
import numpy as np
from PIL import Image, ImageOps
st.write("""
          # Malaria Cell Classification
          """
          )
upload_file = st.sidebar.file_uploader("Upload Cell Images", type="png")
Generate_pred=st.sidebar.button("Predict")
model=tf.keras.models.load_model('malaria_cell.h5')
def import_n_pred(image_data, model):
    size = (150,150)
    image = ImageOps.fit(image_data, size, Image.ANTIALIAS)
    img = np.asarray(image)
    reshape=img[np.newaxis,...]
    pred = model.predict(reshape)
    return pred
if Generate_pred:
    image=Image.open(upload_file)
    with st.beta_expander('Cell Image', expanded = True):
        st.image(image, use_column_width=True)
    pred=import_n_pred(image, model)
    labels = ['Parasitized', 'Uninfected']
    st.title("Prediction of image is {}".format(labels[np.argmax(pred)]))

To run this code, type: streamlit run /your path and .py name in your command prompt.

GUI Display

Before uploading files and making predictions:

Browse File:

GUI Display | Malaria Cell Image Classification

Prediction Result:

Conclusion

Malaria cell image classification uses deep learning faster than most traditional machine learning techniques. The final result of GUI creation is very satisfactory, where the system prediction results or sort of cell images are not in doubt. Display GUI is very user-friendly. With Streamlit, we can save time creating web apps and don’t need front-end capabilities. That’s it. See you at another time.

The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion.

Abdiel Goni 20 Dec 2021

Beginner Classification Deep Learning Python