Credit Card Fraud Detection Using Gated Recurrent Unit

SALONI 19 Oct, 2021 • 17 min read

This article was published as a part of the Data Science Blogathon

Introduction

With the rapid development of Artificial intelligence, the words like Machine Learning, Deep Learning, etc have become a buzzword in the technology world. Machine Learning is a part of Artificial Intelligence that is used to automate analytical model building for data analysis. While Deep Learning is the advanced expansion of Machine Learning that uses Artificial Neural Networks for intelligent decision making on its own. In this article, we have discussed one of the hot topics in data mining i.e, Fraud Detection in credit cards. It has also delineated our discussion by using Gated Recurrent Unit (GRU) deep learning Architecture.

In this article we have covered the following topics related to Credit Card Fraud Detection:

Introduction
Need
Application
Challenges
Implementation
Conclusion

Introduction:

A credit card is a plastic card consisting of a thin rectangular slab including a magnetic strip which is issued by the financial institution to their customers. These cards are provided to the customers so that they can buy new items without paying in cash or through cheque. The financial institution pre-sets the limit of their cards before giving them to customers as per their monthly income.

And fraud is defined as wrong activity carries out by the illegitimate person by misrepresenting themselves either for money or property gain. Therefore, credit card scam is nothing but the gaining of confidential information like passwords, CVV numbers, etc. by intruders. So we need credit card fraud detection techniques to protect the cardholders from false activity.

India is on its way to becoming a developed country. To achieve this, the Government of India (GoI) has launched several initiatives and one of these is Digital India Campaign. The main intention of the government through this initiative is to digitally empower the nation. One of its main tasks includes the promotion of a cashless economy, which can be done by making transactions with a debit card, credit cards, net banking, UPI’s, etc. as a mode of payment rather than going for regular cash or cheque payments.

GoI and the Reserve Bank of India (RBI) have focused immensely on digitalizing the transactions. These have come in handy at the time of crisis which includes the ongoing COVID-19 pandemic, Demonetization by GoI in 2016. The government and the other financial institutions have recommended opting for digital transactions because of their several advantages. One of the most important benefits of digital transactions is that it saves time of customers.

They no longer have to visit ATMs and stand in the queue to withdraw money. Whenever they want to make a payment they just have to swipe the card and enter the PIN or need to provide the OTP while doing the online shopping. Another important reason for the promotion of electronic transactions is to trace the flow of black money and charge the tax defaulters.

This transforming technology comes with some disadvantages too. Cybersecurity is one of the challenges it is facing in the present scenario. Online transactions are done on the compromise of our sensitive information. Any sort of data breach can result in a huge loss for both the service provider as well as the customer. This is one of the major issues of this contemporary world where intruders make use of the slightest loophole in the system to carry out fraud transactions. So, we are in dire need to keep a check on the techniques being used to identify the loopholes and detect the fraudsters associated. This threat is globally recognized and it can be carried out in ways like skimming, phishing, stealing credit cards, etc.

Different sources can be responsible for the same. This can be done by the customer or by bank/credit card service provider or by a third party. A customer when makes a payment using a credit card and fails to repay the amount falls into this category. Bank/credit card service providers create fraudulent transactions by charging for crossing the limits or late payments or cash withdrawals from the customer. But the major threat is the one by the third party. In this, if the third party can successfully get the sensitive data of the cardholder can be abysmal.

As per the Reserve Bank of India, during the year 2015-2016, 2017- 2018, 2018-2019 a total number of fraud cases that has been registered are 1192, 1372, 2059 and 921 respectively. Therefore, credit card fraud detection helps an individual to protect himself/herself from such illegal acts. Credit Cards are widely acceptable cards, so the threat of their misuse is huge.

The major issue with credit card fraud detection is how to classify the transaction as fraudulent or nonfraudulent. As the transactional data set used for credit card fraud detection is unbalanced. Because most of the time the presence of fraudulent transactions in a dataset is very less which is another challenge that needs attention.

Credit card fraud usually impacts both the issuer companies as well as the cardholder. Credit card fraud can be done in many ways. There are several types of credit card fraud. Some of them are:

Merchant Related Fraud
Other kinds of Fraud

Merchant Related Fraud: These types of fraud are carried out by the merchant organization themselves. It involves:

Merchant Collision
Triangulation
Site Cloning
False Merchant Site
Credit Card Generator

Merchant Collision: This type of fraud occurs when someone from the organization itself leaks the information about the card and the cardholder and passes it to the fraudsters.

Triangulation: This type of fraud occurs through websites. Whenever someone does online shopping they usually enter the confidential information of their card. When this information is received by the fraudsters they carried out the fraudulent activity.

Site Cloning: It is also called phishing. In this type of fraud, the fraudster creates a clone of the website which is accessible by the user as the genuine one and the information submitted by the user is illegally utilize by the fraudster for fraudulent activity.

False Merchant Site: Some websites ask their users to enter their personal information like name, age, etc. For verifying this information they even ask their user to provide their credit card information and then these websites illegally sell the information to the third party.

Credit Card Generator: By using mathematical algorithms and combinations it is possible to create any kind of credit card in any format.

Other Kinds of fraud: It involves fraud such as stolen or lost credit cards, cardholders not present, erasing metallic strips, etc.

Moreover, credit card fraud impacts various bodies. It has affected everyone including The Merchants, The Cardholders, The Bank.

The Cardholders: They are the one who is least affected by the credit card fraud. Whenever it will occur the credit cardholder can inform the credit card issuer organization then the credit card issuer will investigate whether there is any illegal transaction that has been taken place or not. If there was any such fraudulent activity has taken place then the organization will chargeback the lost credit to their customer.

The Merchants: They are the ones who are affected the most by the fraud. In any case, if they fail to provide the evidence to the challenge issuer it would result in huge losses.

The Banks: As per the norms provided the banks must fetch the spent amount from the users either directly or indirectly. They also need to spend a huge amount to develop technology that can handle the fraudulent activity.

Need of Credit Card Fraud Detection

As we know, India is in the virtue of becoming a digitally empowered nation. To accomplish this several initiatives have been launched by the Government of India. Due to digitization, most people are now preferring online shopping which requires payment transactions through credit card, debit card, or Net banking rather than going for the regular mode.

And as we know, in online payments the only requirement is about sensitive information like passwords, CVV numbers, OTP, etc. There is no requirement for any physical card. But in any case, if this sensitive information is compromised then it will lead to huge losses for both the service providers as well as for the customer. Therefore, in such a scenario, a credit card fraud detection technique is required to tackle the challenge face by the cardholder.

Moreover, there is no such technology available to date which will trace the fraudulent transaction in real-time. All the credit card companies or banks are using the old method of analyzing the already happened transaction and then applying the machine learning or deep learning algorithms to predict whether the transaction comes under the fraudulent class or it will fall under the non-fraudulent class, which itself is a time taking procedure. And even though, they get success in knowing the class label of the transaction but it will be too late for them to compensate for the loss.

Also, there are chances that the fraudster may have committed many more illegitimate transactions before being recognized. So, to protect the financial bodies as well as the cardholder it is the need of the hour to have such a technique.

Application of Credit Card Fraud Detection

The credit card has been gaining popularity by paving its way into diversified fields. This field includes:

Warehouse stores: It is used in the in-store or online purchasing of an item.
Telecom service: It is used for mobile recharge or gaining other service benefits.
Ride-sharing platforms: It is used to paying bills for the ride.
Grocery stores: It is used for paying the bills for purchasing grocery items.
Restaurant: It is used in restaurants for paying the bills for our meals.
Online Shopping: It is used in purchasing items from online platforms.
Healthcare: It is used in healthcare sectors to pay all the medical bills.
Educational Institution: It is used in the educational sector for tuition fee payments.

And all these fields, where the credit card is accepted, the chances of getting credit card fraud are very high. We need to have a solid validation structure wherever we find the usage of credit cards to detect any kind of fraudulent activities. So, we can say that all those fields where the credit card is accepted also require to have some fraud detection technique to detect the fraudulent transaction.

Challenges in Credit Card Fraud Detection

Gated Recurrent Unit Architecture challenges

Apart, from its need and various application the credit card fraud detection deals with several challenges. Some of them are discussed below:

Unavailability of real data set: One of the toughest challenges is the non-availability of real data sets. Due to confidentiality constraints, financial institutions never want to reveal confidential information about their customers.
Unbalanced data set: Next big challenge with credit card fraud detection is data imbalance. In such cases, many data sets contain genuine transactions, and very few belong to non-genuine transactions. It can be resolve by using various oversampling and under-sampling techniques.
The huge size of data set: As we know, the number of transactions carried out by the credit card every day is increasing tremendously which results in generating a large amount of data set. And to analyze these data set it will require a considerable amount of computational resources as well as time. It will create another challenging task for researchers.
Selecting appropriate evaluation metrics: Moreover, determining the appropriate evaluation metrics is also one of the main challenges. Because the data sets are highly imbalanced choosing accuracy as our evaluation metrics may lead to misclassification problems.
Frequently changing the behavior of fraudsters: Another major challenge is the dynamic behavior of the fraudster. By this, it means that the fraudster keeps on changing the way to carried out fraudulent transactions so that they can’t be traced easily.

Implementation

Also, we have extended our discussion by implementing a credit card fraud detection method using deep learning Architecture. All the experiments are conducted in the Python 3.7.7 programming language. The software operating environment is Jupyter notebook 6.0.3 which is a part of the Anaconda platform. We have implemented deep learning architecture Gated Recurrent Units (GRU) by using Keras library with Tensorflow library as back end. Some other libraries which have been used are NumPy, scipy, pandas, matplotlib, seaborn, sklearn, imblearn. All the data sets have been split in the ratio of 10:90. In our model, we have used one hidden layer, ten neurons, a sigmoid activation function, and 100 epochs.

import numpy as np
import scipy as sp
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
# Pandas options
pd.set_option('display.max_colwidth', 1000, 'display.max_rows', None, 'display.max_columns', None)
# Plotting options
%matplotlib inline
mpl.style.use('ggplot')
sns.set(style='whitegrid')
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Flatten,Dense,Dropout,BatchNormalization
from tensorflow.keras.layers import Conv1D,MaxPool1D
transactions = pd.read_csv('desktopcreditcard.csv')
transactions.shape
(284807, 31)
transactions.info()

RangeIndex: 284807 entries, 0 to 284806
Data columns (total 31 columns):
 #   Column  Non-Null Count   Dtype  
---  ------  --------------   -----  
 0   Time    284807 non-null  float64
 1   V1      284807 non-null  float64
 2   V2      284807 non-null  float64
 3   V3      284807 non-null  float64
 4   V4      284807 non-null  float64
 5   V5      284807 non-null  float64
 6   V6      284807 non-null  float64
 7   V7      284807 non-null  float64
 8   V8      284807 non-null  float64
 9   V9      284807 non-null  float64
 10  V10     284807 non-null  float64
 11  V11     284807 non-null  float64
 12  V12     284807 non-null  float64
 13  V13     284807 non-null  float64
 14  V14     284807 non-null  float64
 15  V15     284807 non-null  float64
 16  V16     284807 non-null  float64
 17  V17     284807 non-null  float64
 18  V18     284807 non-null  float64
 19  V19     284807 non-null  float64
 20  V20     284807 non-null  float64
 21  V21     284807 non-null  float64
 22  V22     284807 non-null  float64
 23  V23     284807 non-null  float64
 24  V24     284807 non-null  float64
 25  V25     284807 non-null  float64
 26  V26     284807 non-null  float64
 27  V27     284807 non-null  float64
 28  V28     284807 non-null  float64
 29  Amount  284807 non-null  float64
 30  Class   284807 non-null  int64  
dtypes: float64(30), int64(1)
memory usage: 67.4 MB
transactions.isnull().any().any()
False
transactions['Class'].value_counts()
0    284315
1       492
Name: Class, dtype: int64
transactions['Class'].value_counts(normalize=True)
0    0.998273
1    0.001727
Name: Class, dtype: float64
X = transactions.drop(labels='Class', axis=1) # Features
y = transactions.loc[:,'Class']#response
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.9, random_state=1, stratify=y)
X_train.shape
(28480, 30)
X_test.shape
(256327, 30)
from sklearn.feature_selection import SelectPercentile
select = SelectPercentile(percentile = 75)
select.fit(X_train,y_train)
SelectPercentile(percentile=74,
                 score_func=)
X_train_selected=select.transform(X_train)
X_test_selected=select.transform(X_test)
print('X_train.shape is :{}'.format(X_train.shape))
X_train.shape is :(28480, 30)
print('X_train_selected.shape is :{}'.format(X_train_selected.shape))
X_train_selected.shape is :(28480, 22)
from imblearn.over_sampling import SMOTE
sm = SMOTE(random_state = 2)
X_train_res, y_train_res = sm.fit_sample(X_train, y_train.ravel())
X_train_res, y_train_res = sm.fit_sample(X_train, y_train.ravel())
print('After OverSampling, the shape of train_y: {} n'.format(y_train_res.shape))
After OverSampling, the shape of train_y: (56862,)
print("After OverSampling, counts of label '1': {}".format(sum(y_train_res == 1)))
After OverSampling, counts of label '1': 28431
print("After OverSampling, counts of label '0': {}".format(sum(y_train_res == 0)))
After OverSampling, counts of label '0': 28431
from sklearn.preprocessing import StandardScaler
stdscaler=StandardScaler()
X=stdscaler.fit_transform(X)
X
array([[-1.99658302, -0.69424232, -0.04407492, ...,  0.33089162,
        -0.06378115,  0.24496426],
       [-1.99658302,  0.60849633,  0.16117592, ..., -0.02225568,
         0.04460752, -0.34247454],
       [-1.99656197, -0.69350046, -0.81157783, ..., -0.13713686,
        -0.18102083,  1.16068593],
       ...,
       [ 1.6419735 ,  0.98002374, -0.18243372, ...,  0.01103672,
        -0.0804672 , -0.0818393 ],
       [ 1.6419735 , -0.12275539,  0.32125034, ...,  0.26960398,
         0.31668678, -0.31324853],
       [ 1.64205773, -0.27233093, -0.11489898, ..., -0.00598394,
         0.04134999,  0.51435531]])
y_train=y_train.to_numpy()
y_test=y_test.to_numpy()
X_train_selected=X_train_selected.reshape(X_train_selected.shape[0],X_train_selected.shape[1],1)
X_test_selected=X_test_selected.reshape(X_test_selected.shape[0],X_test_selected.shape[1],1)
X_train_selected.shape
(28480, 22, 1)
X_test_selected.shape
(256327, 22, 1)
y_train.shape
(28480,)
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import GRU
from keras.layers import Dropout
model= Sequential()
model.add(GRU(units=22,return_sequences=True,input_shape=(X_train_selected.shape[1],1),activation='sigmoid'))
model.add(Dropout(0.2))
model.add(GRU(units=10,return_sequences=False,activation='sigmoid'))
model.add(Dropout(0.2))
model.add(Dense(units=1,activation='sigmoid'))
model.summary()
Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
gru_1 (GRU)                  (None, 22, 22)            1584      
_________________________________________________________________
dropout_1 (Dropout)          (None, 22, 22)            0         
_________________________________________________________________
gru_2 (GRU)                  (None, 10)                990       
_________________________________________________________________
dropout_2 (Dropout)          (None, 10)                0         
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 11        
=================================================================
Total params: 2,585
Trainable params: 2,585
Non-trainable params: 0
__________________________
from tensorflow.keras.optimizers import Adam
model.compile(optimizer='adam',loss='binary_crossentropy',metrics=['accuracy'])
history= model.fit(X_train_selected,y_train,epochs=100,validation_data=(X_test_selected,y_test),verbose=1)
Train on 28480 samples, validate on 256327 samples Epoch 1/100 28480/28480 [==============================] - 88s 3ms/step - loss: 0.2219 - accuracy: 0.9447 - val_loss: 0.0618 - val_accuracy: 0.9983 Epoch 2/100 28480/28480 [==============================] - 73s 3ms/step - loss: 0.0548 - accuracy: 0.9983 - val_loss: 0.0264 - val_accuracy: 0.9983 Epoch 3/100 28480/28480 [==============================] - 53s 2ms/step - loss: 0.0325 - accuracy: 0.9983 - val_loss: 0.0171 - val_accuracy: 0.9983 Epoch 4/100 28480/28480 [==============================] - 49s 2ms/step - loss: 0.0236 - accuracy: 0.9983 - val_loss: 0.0139 - val_accuracy: 0.9983 Epoch 5/100 28480/28480 [==============================] - 48s 2ms/step - loss: 0.0186 - accuracy: 0.9983 - val_loss: 0.0127 - val_accuracy: 0.9983 Epoch 6/100 28480/28480 [==============================] - 44s 2ms/step - loss: 0.0160 - accuracy: 0.9983 - val_loss: 0.0128 - val_accuracy: 0.9983 Epoch 7/100 28480/28480 [==============================] - 48s 2ms/step - loss: 0.0156 - accuracy: 0.9983 - val_loss: 0.0120 - val_accuracy: 0.9983 Epoch 8/100 28480/28480 [==============================] - 43s 2ms/step - loss: 0.0141 - accuracy: 0.9983 - val_loss: 0.0117 - val_accuracy: 0.9983 Epoch 9/100 28480/28480 [==============================] - 44s 2ms/step - loss: 0.0135 - accuracy: 0.9983 - val_loss: 0.0108 - val_accuracy: 0.9983 Epoch 10/100 28480/28480 [==============================] - 45s 2ms/step - loss: 0.0112 - accuracy: 0.9983 - val_loss: 0.0077 - val_accuracy: 0.9983 Epoch 11/100 28480/28480 [==============================] - 46s 2ms/step - loss: 0.0083 - accuracy: 0.9983 - val_loss: 0.0059 - val_accuracy: 0.9983 Epoch 12/100 28480/28480 [==============================] - 46s 2ms/step - loss: 0.0068 - accuracy: 0.9983 - val_loss: 0.0052 - val_accuracy: 0.9983 Epoch 13/100 28480/28480 [==============================] - 48s 2ms/step - loss: 0.0062 - accuracy: 0.9984 - val_loss: 0.0048 - val_accuracy: 0.9989 Epoch 14/100 28480/28480 [==============================] - 46s 2ms/step - loss: 0.0057 - accuracy: 0.9988 - val_loss: 0.0047 - val_accuracy: 0.9990 Epoch 15/100 28480/28480 [==============================] - 45s 2ms/step - loss: 0.0054 - accuracy: 0.9988 - val_loss: 0.0044 - val_accuracy: 0.9991 Epoch 16/100 28480/28480 [==============================] - 43s 2ms/step - loss: 0.0053 - accuracy: 0.9991 - val_loss: 0.0042 - val_accuracy: 0.9992 Epoch 17/100 28480/28480 [==============================] - 44s 2ms/step - loss: 0.0050 - accuracy: 0.9991 - val_loss: 0.0041 - val_accuracy: 0.9993 Epoch 18/100 28480/28480 [==============================] - 44s 2ms/step - loss: 0.0047 - accuracy: 0.9990 - val_loss: 0.0041 - val_accuracy: 0.9993 Epoch 19/100 28480/28480 [==============================] - 44s 2ms/step - loss: 0.0047 - accuracy: 0.9991 - val_loss: 0.0042 - val_accuracy: 0.9993 Epoch 20/100 28480/28480 [==============================] - 44s 2ms/step - loss: 0.0050 - accuracy: 0.9991 - val_loss: 0.0041 - val_accuracy: 0.9993 Epoch 21/100 28480/28480 [==============================] - 44s 2ms/step - loss: 0.0047 - accuracy: 0.9993 - val_loss: 0.0042 - val_accuracy: 0.9993 Epoch 22/100 28480/28480 [==============================] - 46s 2ms/step - loss: 0.0047 - accuracy: 0.9992 - val_loss: 0.0039 - val_accuracy: 0.9994 Epoch 23/100 28480/28480 [==============================] - 44s 2ms/step - loss: 0.0045 - accuracy: 0.9993 - val_loss: 0.0040 - val_accuracy: 0.9994 Epoch 24/100 28480/28480 [==============================] - 44s 2ms/step - loss: 0.0046 - accuracy: 0.9993 - val_loss: 0.0039 - val_accuracy: 0.9994 Epoch 25/100 28480/28480 [==============================] - 44s 2ms/step - loss: 0.0044 - accuracy: 0.9992 - val_loss: 0.0040 - val_accuracy: 0.9994 Epoch 26/100 28480/28480 [==============================] - 43s 2ms/step - loss: 0.0046 - accuracy: 0.9991 - val_loss: 0.0040 - val_accuracy: 0.9994 Epoch 27/100 28480/28480 [==============================] - 47s 2ms/step - loss: 0.0045 - accuracy: 0.9993 - val_loss: 0.0039 - val_accuracy: 0.9994 Epoch 28/100 28480/28480 [==============================] - 47s 2ms/step - loss: 0.0043 - accuracy: 0.9992 - val_loss: 0.0039 - val_accuracy: 0.9994 Epoch 29/100 28480/28480 [==============================] - 48s 2ms/step - loss: 0.0045 - accuracy: 0.9993 - val_loss: 0.0039 - val_accuracy: 0.9994 Epoch 30/100 28480/28480 [==============================] - 44s 2ms/step - loss: 0.0041 - accuracy: 0.9993 - val_loss: 0.0038 - val_accuracy: 0.9994 Epoch 31/100 28480/28480 [==============================] - 44s 2ms/step - loss: 0.0046 - accuracy: 0.9993 - val_loss: 0.0038 - val_accuracy: 0.9994 Epoch 32/100 28480/28480 [==============================] - 45s 2ms/step - loss: 0.0045 - accuracy: 0.9992 - val_loss: 0.0038 - val_accuracy: 0.9994 Epoch 33/100 28480/28480 [==============================] - 45s 2ms/step - loss: 0.0044 - accuracy: 0.9994 - val_loss: 0.0041 - val_accuracy: 0.9993 Epoch 34/100 28480/28480 [==============================] - 45s 2ms/step - loss: 0.0042 - accuracy: 0.9993 - val_loss: 0.0041 - val_accuracy: 0.9992 Epoch 35/100 28480/28480 [==============================] - 48s 2ms/step - loss: 0.0041 - accuracy: 0.9994 - val_loss: 0.0040 - val_accuracy: 0.9993 Epoch 36/100 28480/28480 [==============================] - 49s 2ms/step - loss: 0.0037 - accuracy: 0.9995 - val_loss: 0.0039 - val_accuracy: 0.9994 Epoch 37/100 28480/28480 [==============================] - 47s 2ms/step - loss: 0.0042 - accuracy: 0.9993 - val_loss: 0.0039 - val_accuracy: 0.9993 Epoch 38/100 28480/28480 [==============================] - 46s 2ms/step - loss: 0.0041 - accuracy: 0.9993 - val_loss: 0.0038 - val_accuracy: 0.9994 Epoch 39/100 28480/28480 [==============================] - 47s 2ms/step - loss: 0.0041 - accuracy: 0.9992 - val_loss: 0.0038 - val_accuracy: 0.9994 Epoch 40/100 28480/28480 [==============================] - 47s 2ms/step - loss: 0.0041 - accuracy: 0.9993 - val_loss: 0.0040 - val_accuracy: 0.9993 Epoch 41/100 28480/28480 [==============================] - 47s 2ms/step - loss: 0.0041 - accuracy: 0.9992 - val_loss: 0.0036 - val_accuracy: 0.9994 Epoch 42/100 28480/28480 [==============================] - 49s 2ms/step - loss: 0.0039 - accuracy: 0.9993 - val_loss: 0.0038 - val_accuracy: 0.9993 Epoch 43/100 28480/28480 [==============================] - 45s 2ms/step - loss: 0.0039 - accuracy: 0.9994 - val_loss: 0.0035 - val_accuracy: 0.9993 Epoch 44/100 28480/28480 [==============================] - 44s 2ms/step - loss: 0.0039 - accuracy: 0.9993 - val_loss: 0.0036 - val_accuracy: 0.9994 Epoch 45/100 28480/28480 [==============================] - 45s 2ms/step - loss: 0.0037 - accuracy: 0.9993 - val_loss: 0.0035 - val_accuracy: 0.9994 Epoch 46/100 28480/28480 [==============================] - 43s 2ms/step - loss: 0.0039 - accuracy: 0.9993 - val_loss: 0.0035 - val_accuracy: 0.9993 Epoch 47/100 28480/28480 [==============================] - 45s 2ms/step - loss: 0.0036 - accuracy: 0.9994 - val_loss: 0.0035 - val_accuracy: 0.9994 Epoch 48/100 28480/28480 [==============================] - 53s 2ms/step - loss: 0.0036 - accuracy: 0.9994 - val_loss: 0.0036 - val_accuracy: 0.9993 Epoch 49/100 28480/28480 [==============================] - 48s 2ms/step - loss: 0.0036 - accuracy: 0.9994 - val_loss: 0.0035 - val_accuracy: 0.9994 Epoch 50/100 28480/28480 [==============================] - 44s 2ms/step - loss: 0.0038 - accuracy: 0.9993 - val_loss: 0.0039 - val_accuracy: 0.9993 Epoch 51/100 28480/28480 [==============================] - 47s 2ms/step - loss: 0.0038 - accuracy: 0.9993 - val_loss: 0.0035 - val_accuracy: 0.9993 Epoch 52/100 28480/28480 [==============================] - 48s 2ms/step - loss: 0.0036 - accuracy: 0.9994 - val_loss: 0.0036 - val_accuracy: 0.9993 Epoch 53/100 28480/28480 [==============================] - 47s 2ms/step - loss: 0.0038 - accuracy: 0.9994 - val_loss: 0.0037 - val_accuracy: 0.9993 Epoch 54/100 28480/28480 [==============================] - 45s 2ms/step - loss: 0.0036 - accuracy: 0.9994 - val_loss: 0.0035 - val_accuracy: 0.9993 Epoch 55/100 28480/28480 [==============================] - 49s 2ms/step - loss: 0.0038 - accuracy: 0.9994 - val_loss: 0.0035 - val_accuracy: 0.9993 Epoch 56/100 28480/28480 [==============================] - 49s 2ms/step - loss: 0.0035 - accuracy: 0.9995 - val_loss: 0.0036 - val_accuracy: 0.9993 Epoch 57/100 28480/28480 [==============================] - 45s 2ms/step - loss: 0.0037 - accuracy: 0.9994 - val_loss: 0.0036 - val_accuracy: 0.9993 Epoch 58/100 28480/28480 [==============================] - 47s 2ms/step - loss: 0.0035 - accuracy: 0.9994 - val_loss: 0.0035 - val_accuracy: 0.9993 Epoch 59/100 28480/28480 [==============================] - 47s 2ms/step - loss: 0.0035 - accuracy: 0.9994 - val_loss: 0.0038 - val_accuracy: 0.9992 Epoch 60/100 28480/28480 [==============================] - 45s 2ms/step - loss: 0.0036 - accuracy: 0.9993 - val_loss: 0.0036 - val_accuracy: 0.9993 Epoch 61/100 28480/28480 [==============================] - 43s 2ms/step - loss: 0.0035 - accuracy: 0.9993 - val_loss: 0.0037 - val_accuracy: 0.9993 Epoch 62/100 28480/28480 [==============================] - 47s 2ms/step - loss: 0.0036 - accuracy: 0.9994 - val_loss: 0.0037 - val_accuracy: 0.9992 Epoch 63/100 28480/28480 [==============================] - 56s 2ms/step - loss: 0.0033 - accuracy: 0.9995 - val_loss: 0.0037 - val_accuracy: 0.9993s - loss: 0 Epoch 64/100 28480/28480 [==============================] - 47s 2ms/step - loss: 0.0035 - accuracy: 0.9993 - val_loss: 0.0037 - val_accuracy: 0.9993 Epoch 65/100 28480/28480 [==============================] - 48s 2ms/step - loss: 0.0032 - accuracy: 0.9994 - val_loss: 0.0039 - val_accuracy: 0.9992 Epoch 66/100 28480/28480 [==============================] - 51s 2ms/step - loss: 0.0033 - accuracy: 0.9994 - val_loss: 0.0037 - val_accuracy: 0.9992 Epoch 67/100 28480/28480 [==============================] - 45s 2ms/step - loss: 0.0032 - accuracy: 0.9993 - val_loss: 0.0038 - val_accuracy: 0.9992 Epoch 68/100 28480/28480 [==============================] - 45s 2ms/step - loss: 0.0032 - accuracy: 0.9995 - val_loss: 0.0039 - val_accuracy: 0.9992 Epoch 69/100 28480/28480 [==============================] - 49s 2ms/step - loss: 0.0032 - accuracy: 0.9995 - val_loss: 0.0044 - val_accuracy: 0.9991 Epoch 70/100 28480/28480 [==============================] - 53s 2ms/step - loss: 0.0033 - accuracy: 0.9994 - val_loss: 0.0037 - val_accuracy: 0.9992 Epoch 71/100 28480/28480 [==============================] - 51s 2ms/step - loss: 0.0028 - accuracy: 0.9995 - val_loss: 0.0036 - val_accuracy: 0.9993 Epoch 72/100 28480/28480 [==============================] - 50s 2ms/step - loss: 0.0034 - accuracy: 0.9994 - val_loss: 0.0038 - val_accuracy: 0.9992 Epoch 73/100 28480/28480 [==============================] - 48s 2ms/step - loss: 0.0034 - accuracy: 0.9994 - val_loss: 0.0037 - val_accuracy: 0.9992 Epoch 74/100 28480/28480 [==============================] - 44s 2ms/step - loss: 0.0031 - accuracy: 0.9995 - val_loss: 0.0038 - val_accuracy: 0.9992 Epoch 75/100 28480/28480 [==============================] - 44s 2ms/step - loss: 0.0033 - accuracy: 0.9994 - val_loss: 0.0037 - val_accuracy: 0.9993 Epoch 76/100 28480/28480 [==============================] - 48s 2ms/step - loss: 0.0033 - accuracy: 0.9995 - val_loss: 0.0039 - val_accuracy: 0.9992 Epoch 77/100 28480/28480 [==============================] - 46s 2ms/step - loss: 0.0031 - accuracy: 0.9994 - val_loss: 0.0037 - val_accuracy: 0.9992 Epoch 78/100 28480/28480 [==============================] - 48s 2ms/step - loss: 0.0033 - accuracy: 0.9993 - val_loss: 0.0038 - val_accuracy: 0.9992 Epoch 79/100 28480/28480 [==============================] - 47s 2ms/step - loss: 0.0030 - accuracy: 0.9994 - val_loss: 0.0037 - val_accuracy: 0.9993 Epoch 80/100 28480/28480 [==============================] - 50s 2ms/step - loss: 0.0029 - accuracy: 0.9995 - val_loss: 0.0037 - val_accuracy: 0.9993 Epoch 81/100 28480/28480 [==============================] - 47s 2ms/step - loss: 0.0031 - accuracy: 0.9995 - val_loss: 0.0044 - val_accuracy: 0.9992 Epoch 82/100 28480/28480 [==============================] - 49s 2ms/step - loss: 0.0029 - accuracy: 0.9995 - val_loss: 0.0037 - val_accuracy: 0.9993 Epoch 83/100 28480/28480 [==============================] - 50s 2ms/step - loss: 0.0033 - accuracy: 0.9993 - val_loss: 0.0046 - val_accuracy: 0.9991 Epoch 84/100 28480/28480 [==============================] - 49s 2ms/step - loss: 0.0032 - accuracy: 0.9994 - val_loss: 0.0038 - val_accuracy: 0.9993 Epoch 85/100 28480/28480 [==============================] - 48s 2ms/step - loss: 0.0028 - accuracy: 0.9994 - val_loss: 0.0039 - val_accuracy: 0.9993 Epoch 86/100 28480/28480 [==============================] - 48s 2ms/step - loss: 0.0029 - accuracy: 0.9994 - val_loss: 0.0056 - val_accuracy: 0.9986 Epoch 87/100 28480/28480 [==============================] - 48s 2ms/step - loss: 0.0027 - accuracy: 0.9995 - val_loss: 0.0038 - val_accuracy: 0.9993 Epoch 88/100 28480/28480 [==============================] - 48s 2ms/step - loss: 0.0027 - accuracy: 0.9994 - val_loss: 0.0040 - val_accuracy: 0.9991 Epoch 89/100 28480/28480 [==============================] - 48s 2ms/step - loss: 0.0030 - accuracy: 0.9995 - val_loss: 0.0038 - val_accuracy: 0.9992 Epoch 90/100 28480/28480 [==============================] - 48s 2ms/step - loss: 0.0031 - accuracy: 0.9992 - val_loss: 0.0039 - val_accuracy: 0.9992 Epoch 91/100 28480/28480 [==============================] - 48s 2ms/step - loss: 0.0031 - accuracy: 0.9994 - val_loss: 0.0038 - val_accuracy: 0.9993 Epoch 92/100 28480/28480 [==============================] - 48s 2ms/step - loss: 0.0027 - accuracy: 0.9995 - val_loss: 0.0038 - val_accuracy: 0.9993 Epoch 93/100 28480/28480 [==============================] - 49s 2ms/step - loss: 0.0033 - accuracy: 0.9994 - val_loss: 0.0038 - val_accuracy: 0.9993 Epoch 94/100 28480/28480 [==============================] - 48s 2ms/step - loss: 0.0028 - accuracy: 0.9995 - val_loss: 0.0039 - val_accuracy: 0.9993 Epoch 95/100 28480/28480 [==============================] - 49s 2ms/step - loss: 0.0027 - accuracy: 0.9996 - val_loss: 0.0038 - val_accuracy: 0.9993 Epoch 96/100 28480/28480 [==============================] - 48s 2ms/step - loss: 0.0026 - accuracy: 0.9996 - val_loss: 0.0038 - val_accuracy: 0.9993 Epoch 97/100 28480/28480 [==============================] - 50s 2ms/step - loss: 0.0026 - accuracy: 0.9996 - val_loss: 0.0039 - val_accuracy: 0.9993 Epoch 98/100 28480/28480 [==============================] - 48s 2ms/step - loss: 0.0027 - accuracy: 0.9994 - val_loss: 0.0038 - val_accuracy: 0.9993 Epoch 99/100 28480/28480 [==============================] - 48s 2ms/step - loss: 0.0029 - accuracy: 0.9993 - val_loss: 0.0040 - val_accuracy: 0.9992 Epoch 100/100 28480/28480 [==============================] - 48s 2ms/step - loss: 0.0026 - accuracy: 0.9996 - val_loss: 0.0039 - val_accuracy: 0.9993 
# Predicting the Test set results
y_pred = model.predict(X_test_selected)
y_pred = (y_pred > 0.5)
score = model.evaluate(X_test_selected, y_test)
256327/256327 [==============================] - 36s 140us/step
score
[0.0039123187033642, 0.9992704391479492]
#Let's see how our model performed
from sklearn.metrics import classification_report
sklearn.metrics.classification_report
print(classification_report(y_test, y_pred))
 precision    recall  f1-score   support
           0       1.00      1.00      1.00    255884
           1       0.81      0.75      0.78       443
    accuracy                           1.00    256327
   macro avg       0.91      0.88      0.89    256327
weighted avg       1.00      1.00      1.00    256327
from sklearn.metrics import confusion_matrix
confusion_matrix(y_test, y_pred)
array([[255807,     77],
       [   110,    333]], dtype=int64)
from sklearn.metrics import matthews_corrcoef
MCC=matthews_corrcoef(y_test,y_pred)
print(" Matthews correlation coefficient is{}".format(MCC))
 Matthews correlation coefficient is0.7809955296478702
from sklearn.metrics import roc_curve
y_pred_keras = model.predict(X_test_selected).ravel()
fpr_keras, tpr_keras, thresholds_keras = roc_curve(y_test, y_pred_keras,pos_label=True)
from sklearn.metrics import auc
auc_keras = auc(fpr_keras, tpr_keras)
plt.figure(1)
plt.plot([0, 1], [0, 1], 'k--')
plt.plot(fpr_keras, tpr_keras, label='Keras (area = {:.3f})'.format(auc_keras))
plt.xlabel('False positive rate')
plt.ylabel('True positive rate')
plt.title('ROC curve')
plt.legend(loc='best')
plt.show()

Conclusion:

In this article, we have covered the various aspects related to credit card fraud detection. Also, we have implemented credit card fraud detection using the Gated Recurrent Unit. We have used several evaluation metrics but our main focus is on the F1 score. After evaluation, we have noted that our model has achieved 0.78 F1 scores.

References:

https://github.com/saloni151/credit-card-fraud-detection-using-gru/blob/main/GRU%20european%20sigmoid%201%20hl%2010nn.ipynb

For further queries contact us-

Saloni and Ritesh

[email protected]

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.

SALONI 19 Oct 2021

Advanced Deep Learning Project Python Structured Data