Covid Detection using X-Ray

Parth Dave 01 Jun, 2021 • 5 min read

This article was published as a part of the Data Science Blogathon

Introduction

Covid is a deadly disease, which affects the respiratory system. It is very important to detect whether or not a person has covid. In this blog, we are going to identify whether or not a person has covid. Let’s begin this blog without any ado.

Importing

Let’s import the required tools for carrying out our task.

import pandas as pd
import numpy as np
from keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.applications import ResNet50
from tensorflow.python.keras.models import Sequential
from tensorflow.python.keras.layers import Dense, Flatten, GlobalAveragePooling2D
from tensorflow.keras.applications import ResNet50
import tensorflow as tf
import matplotlib.pyplot as plt

Reading Data

train_data=pd.read_csv("Training_set_covid.csv")
print(train_data)

	filename	label
0	Image_1.jpg	1
1	Image_2.jpg	0
2	Image_3.jpg	0
3	Image_4.jpg	0
4	Image_5.jpg	0
…	…	…
3474	Image_3475.jpg	0
3475	Image_3476.jpg	0
3476	Image_3477.jpg	0
3477	Image_3478.jpg	1
3478	Image_3479.jpg	0

Since we have just the name of the image file and their label, let’s add a column consisting of their file path. With whose help we can load images using ImageDataGenerator.

train_data["filepath"]="train/"+train_data["filename"]
print(train_data)

	filename	label	file-path
0	Image_1.jpg	1	file-path
1	Image_2.jpg	0	file-path
2	Image_3.jpg	0	file-path
3	Image_4.jpg	0	file-path
4	Image_5.jpg	0	file-path
…	…	…	…
3474	Image_3475.jpg	0	file-path
3475	Image_3476.jpg	0	file-path
3476	Image_3477.jpg	0	file-path
3477	Image_3478.jpg	1	file-path
3478	Image_3479.jpg	0	file-path

I have replaced the actual file path with “file-path”.

Data Augmentation

Since we have very little data, I am augmenting my data and also creating a variable where training and validation image will be stored

train_datagen=ImageDataGenerator(validation_split=0.2,zoom_range=0.2,rescale=1./255.,horizontal_flip=True)

Here, we set the zoom range to 0.2 and rescaled images to 1/255 size, and made a validation set of 20% of our training image.

train_data["label"]=train_data["label"].astype(str)

The above code will convert our int labels into string types.

Loading Images

train_images=train_datagen.flow_from_dataframe(train_data,x_col="filepath",batch_size=8,target_size=(255,255),class_mode="binary",shuffle=True,subset='training',y_col="label")
valid_images=train_datagen.flow_from_dataframe(train_data,x_col="filepath",batch_size=8,target_size=(255,255),class_mode="binary",shuffle=True,subset='validation',y_col="label")

Here, we are creating 2 data streams, 1 training set, and another validation set. For which we have set some parameters. Parameters like:

1) Batch Size –> 8
2) Target Size (Image size) –> (255,255)
3) Shuffle –> True
4) Class Mode –> Binary

As we have only 2 target attributes we have set our class mode Binary. Resnet50 takes (255,255) as input shape, hence our image size. Shuffle is set to be True, to shuffle images well enough so that both sets have various images.

RESNET50

Let’s load our ResNet50 Model. But before going into the code path, visit this site to understand what ResNet50 is and how it works.

base_model = ResNet50(input_shape=(225, 225,3), include_top=False, weights="imagenet")
for layer in base_model.layers:
    layer.trainable = False

base_model = Sequential()
base_model.add(ResNet50(include_top=False, weights='imagenet', pooling='max'))
base_model.add(Dense(1, activation='sigmoid'))
base_model.compile(optimizer = tf.keras.optimizers.SGD(lr=0.0001), loss = 'binary_crossentropy', metrics = ['acc'])
base_model.summary()

Let’s understand this code block.

First, we will load a base model with our input shape, and weights. Here, we have used image-net weight. We have removed the top as we will add our output layer.
Then, We set all our base model layers Non-Trainable as we don’t want to overwrite weights that we have imported while loading.
Now, we are creating our base model, where we add a Dense layer of 50 units to create a fully connected layer that is connected to our output dense layer with 1 unit and Sigmoid as an activation function. Then we compiled it with an SGD optimizer and some fine-tuning and binary cross-entropy as loss function.

This is how our model looks:

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
resnet50 (Functional)        (None, 2048)              23587712  
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 2049      
=================================================================
Total params: 23,589,761
Trainable params: 23,536,641
Non-trainable params: 53,120
_________________________________________________________________

Training

resnet_history = base_model.fit(train_images, validation_data = valid_images, steps_per_epoch =int(train_images.n/8), epochs = 20)

Let’s train our model with, train images, and validate them using a validation set. With 20 epochs, and steps per epochs are set to the number of train sample/8 to balance, batch size of 8.

Epoch 1/20
348/348 [==============================] - 162s 453ms/step - loss: 0.7703 - acc: 0.8588 - val_loss: 1.2608 - val_acc: 0.8791
Epoch 2/20
348/348 [==============================] - 84s 242ms/step - loss: 0.3550 - acc: 0.9198 - val_loss: 2.8488 - val_acc: 0.4518
Epoch 3/20
348/348 [==============================] - 84s 241ms/step - loss: 0.3501 - acc: 0.9229 - val_loss: 1.0196 - val_acc: 0.7007
Epoch 4/20
348/348 [==============================] - 84s 241ms/step - loss: 0.3151 - acc: 0.9207 - val_loss: 0.5498 - val_acc: 0.8633
Epoch 5/20
348/348 [==============================] - 85s 245ms/step - loss: 0.2376 - acc: 0.9281 - val_loss: 0.4247 - val_acc: 0.9094
Epoch 6/20
348/348 [==============================] - 86s 246ms/step - loss: 0.2273 - acc: 0.9248 - val_loss: 0.3911 - val_acc: 0.8978
Epoch 7/20
348/348 [==============================] - 86s 248ms/step - loss: 0.2166 - acc: 0.9319 - val_loss: 0.2936 - val_acc: 0.9295
Epoch 8/20
348/348 [==============================] - 87s 251ms/step - loss: 0.2016 - acc: 0.9389 - val_loss: 0.3025 - val_acc: 0.9281
Epoch 9/20
348/348 [==============================] - 87s 249ms/step - loss: 0.1557 - acc: 0.9516 - val_loss: 0.2762 - val_acc: 0.9281
Epoch 10/20
348/348 [==============================] - 85s 243ms/step - loss: 0.1802 - acc: 0.9418 - val_loss: 0.3382 - val_acc: 0.9353
Epoch 11/20
348/348 [==============================] - 84s 242ms/step - loss: 0.1430 - acc: 0.9586 - val_loss: 0.3222 - val_acc: 0.9324
Epoch 12/20
348/348 [==============================] - 85s 243ms/step - loss: 0.0977 - acc: 0.9695 - val_loss: 0.2110 - val_acc: 0.9410
Epoch 13/20
348/348 [==============================] - 85s 245ms/step - loss: 0.1227 - acc: 0.9572 - val_loss: 0.2738 - val_acc: 0.9281
Epoch 14/20
348/348 [==============================] - 86s 246ms/step - loss: 0.1396 - acc: 0.9558 - val_loss: 0.2508 - val_acc: 0.9439
Epoch 15/20
348/348 [==============================] - 86s 247ms/step - loss: 0.1173 - acc: 0.9578 - val_loss: 0.2025 - val_acc: 0.9381
Epoch 16/20
348/348 [==============================] - 86s 246ms/step - loss: 0.1038 - acc: 0.9604 - val_loss: 0.2658 - val_acc: 0.9439
Epoch 17/20
348/348 [==============================] - 86s 247ms/step - loss: 0.0881 - acc: 0.9707 - val_loss: 0.2997 - val_acc: 0.9309
Epoch 18/20
348/348 [==============================] - 87s 249ms/step - loss: 0.1036 - acc: 0.9627 - val_loss: 0.2527 - val_acc: 0.9367
Epoch 19/20
348/348 [==============================] - 87s 251ms/step - loss: 0.0848 - acc: 0.9736 - val_loss: 0.2461 - val_acc: 0.9439
Epoch 20/20
348/348 [==============================] - 87s 250ms/step - loss: 0.0742 - acc: 0.9736 - val_loss: 0.2483 - val_acc: 0.9439

Well, we have great results. Let’s see its graph

Performance Graph

Accuracy Graph

plt.plot(resnet_history.history["acc"],label="train")
plt.plot(resnet_history.history["val_acc"],label="val")
plt.title("Training Accuracy and Validation Accuracy")
plt.legend()

Covid Detection using X-Ray 3

We have good training and validation accuracy.

Loss

plt.plot(resnet_history.history["loss"],label="train")
plt.plot(resnet_history.history["val_loss"],label="val")
plt.title("Training Loss and Validation Loss")
plt.legend()

Covid Detection using X-Ray 2