There are many ways a machine can be taught to generate an output on unseen data. The technological advancement in different sectors has left everyone shocked. we are now at a point where deep learning and neural networks are so powerful that can generate a new human face from scratch that does not exist before but looks real based on some trained data. The technique is none other than GAN(Generative Adversarial Network) model which is our topic of study. Let’s look at the table of content to understand the main topics we will cover.
In this article, you will explore the fascinating world of Generative Adversarial Networks (GANs), including how GAN AI generators work, their diverse applications, and the impact of GAN image generation in various industries.
This article was published as a part of the Data Science Blogathon
Generative Adversarial Networks (GANs) were developed in 2014 by Ian Goodfellow and his teammates. GAN is basically an approach to generative modeling that generates a new set of data based on training data that look like training data. GANs have two main blocks(two neural networks) which compete with each other and are able to capture, copy, and analyze the variations in a dataset.
The two models are usually called Generator and Discriminator which we will cover in Components on GANs.
The generator network takes random input (typically noise) and generates samples resembling the training data, such as images, text, or audio. Its goal is to produce samples indistinguishable from real data.
The discriminator network, on the other hand, distinguishes between real and generated samples, classifying real data as real and generated data as fake.
The training process involves an adversarial game where the generator tries to fool the discriminator, and the discriminator aims to improve its ability to distinguish real from generated data.
Over time, the generator becomes better at creating realistic samples, and the discriminator becomes more skilled at identifying them. This process ideally leads to the generator producing high-quality samples that are hard to distinguish from real data.
GAN Techniques have shown impressive results in various domains like image synthesis, text generation, and video generation, enhancing the field of generative modeling and enabling new creative applications in artificial intelligence.
Machine learning algorithms and neural networks can easily be fooled to misclassify things by adding some amount of noise to data. After adding some amount of noise, the chances of misclassifying the images increase.
Hence the small rise that, is it possible to implement something that neural networks can start visualizing new patterns like sample train data. Thus GANs were built that generate new fake results similar to the original.
Also Read: A Detailed Explanation of GAN with Implementation Using Tensorflow and Keras
The generator network’s purpose is to generate new data samples that resemble the training data. It takes random noise as input and produces various types of data samples, such as images, text, or audio. The primary objective of the generator is to fool the discriminator by creating data samples that are realistic enough to be indistinguishable from real data.
The discriminator network’s purpose is to distinguish between real data and data generated by the generator. It receives both real data from the training set and fake data from the generator as input. The output is a probability indicating whether the input data is real or fake. The primary objective of the discriminator is to correctly identify real versus generated data.
The adversarial training process aims to improve both the generator and the discriminator through a competitive dynamic. The generator’s goal is to enhance its ability to create realistic data that can fool the discriminator. Conversely, the discriminator’s goal is to improve its capability to distinguish between real and fake data. Through iterative improvement, both networks continuously advance as they learn from each other’s feedback.
Generator loss measures how well the generator fools the discriminator, typically aiming to minimize the discriminator’s ability to detect fake data. Discriminator loss measures how well the discriminator differentiates between real and fake data, aiming to maximize correct classifications of real and fake samples.
Random noise serves as the input to the generator network. It is usually a vector of random values sampled from a uniform or normal distribution. This noise provides the generator with a diverse set of inputs, enabling it to produce varied data samples.
Real data consists of genuine data samples from the domain of interest. Fake data refers to the samples generated by the generator during training.
Also Read: Beginner’s Guide on Types of Generative Adversarial Networks
Two major components of GANs are Generator and Discriminator. The role of the generator is like a thief to generate the fake samples based on the original sample and make the discriminator fool to understand Fake as real. On the other hand, a Discriminator is like a Police whose role is to identify the abnormalities in the samples created by Generator and classify them as Fake or real. This competition between both the component goes on until the level of perfection is achieved where Generator wins making a Discriminator fool on fake data.
Now let us understand, what is this two-component to understand the training process of GAN intuitively.
It is a supervised approach means It is a simple classifier that predicts data is fake or real. It is trained on real data and provides feedback to a generator.
It is an unsupervised learning approach. Its aim is to generate the fake image based on feedback and make the discriminator fool that it cannot predict a fake image. And when the discriminator is made a fool by the generator, the training stops and we can say that a generalized GAN Techniques model is created.
Here the generative model captures the distribution of data and is trained in such a manner to generate the new sample that tries to maximize the probability of the discriminator to make a mistake(maximize discriminator loss). The discriminator on other hand is based on a model that estimates the probability that the sample it receives is from training data not from the generator and tries to classify it accurately and minimize the GAN accuracy. Hence the GAN Architecture network is formulated as a minimax game where the Discriminator is trying to minimize its reward V(D, G) and the generator is trying to maximize the Discriminator loss.
Now you might be wondering how is an actual architecture of GAN, and how two neural networks are build and training and prediction is done? To simplify it have a look at the below general architecture of GAN.
We know that both components are neural networks. we can see that generator output is directly connected to the input of the discriminator. And discriminator predicts it and through backpropagation, the generator receives a feedback signal to update weights and improve performance. The discriminator is a feed-forward neural network.
We know the geometric intuition of GAN, Now let us understand the training of Gan. In this section training of Generator and Discriminator will separately be clear to you.
The problem statement is key to the success of the project so the first step is to define your problem. GANs work with a different set of problems you are aiming so you need to define What you are creating like audio, poem, text, Image is a type of problem.
There are many different types of GAN Architecture, that we will study further. we have to define which type of GAN architecture we are using.
Now Discriminator is trained on a real dataset. It is only having a forward path, no backpropagation is there in the training of the Discriminator in n epochs. And the Data you are providing is without Noise and only contains real images, and for fake images, Discriminator uses instances created by the generator as negative output. Now, what happens at the time of discriminator training.
Provide some Fake inputs for the generator(Noise) and It will use some random noise and generate some fake outputs. when Generator is trained, Discriminator is Idle and when Discriminator is trained, Generator is Idle. During generator training through any random noise as input, it tries to transform it into meaningful data. to get meaningful output from the generator takes time and runs under many epochs. steps to train a generator are listed below.
The samples which are generated by Generator will pass to Discriminator and It will predict the data passed to it is Fake or real and provide feedback to Generator again.
Again Generator will be trained on the feedback given by Discriminator and try to improve performance.
This is an iterative process and continues running until the Generator is not successful in making the discriminator fool.
I hope that the working of the GAN network is completely understandable and now let us understand the loss function it uses and minimize and maximize in this iterative process. The generator tries to minimize the following loss function while the discriminator tries to maximize it. It is the same as a minimax game if you have ever played.
Now we will follow all the above steps and make our hands dirty by implementing GAN on a very popular dataset known as MNIST Dataset.
MNIST Dataset is a very popular dataset of hand-written digits images between 0 to 9 in grayscale form of size 28*28. And a total of 60000 images of such small square images is present in the MNIST dataset.
Our aim is to train the Discriminator Model with MNIST dataset and with some Noise and after providing some sample noise same as MNIST to Generator model to generate the same information as MNIST dataset that gives feels exactly or original images but are actually generated by Generator model. let’s get started with importing the libraries we need.
Import libraries are helpful for preprocessing, transforming, and creating a neural network model.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import os
import tensorflow as tf
from tensorflow.keras.layers import Input, Dense, LeakyReLU, Dropout, BatchNormalization
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import SGD, Adam
When we load the in-built dataset from any library then most of the time It is already splitted into train and test set so we will load a dataset into two different forms.
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
# Scale the inputs in range of (-1, +1) for better training
x_train, x_test = x_train / 255.0 * 2 - 1, x_test / 255.0 * 2 - 1
If you want to plot some example images from the dataset then you can simply plot it from the training dataset using matplotlib.
for i in range(49):
plt.subplot(7, 7, i+1)
plt.axis("off")
#plot raw pixel data
plt.imshow(x_train[i])
plt.show()
If you print the shape of the dataset then train data has 60000 images of 28*28 size and test data has 10000 images of 28*28 size.
As the dimension of the dataset is 3 so we will flatten it to 2 dimensions and 28*28 means 684 and get converted to 60000 by 684.
N, H, W = x_train.shape #number, height, width
D = H * W #dimension (28, 28)
x_train = x_train.reshape(-1, D)
x_test = x_test.reshape(-1, D)
Here we define a function to develop a deep convolutional Neural network. latent dimension is a variable that defines the number of inputs to the model. We define the input layer, three hidden layers followed by Batch normalization, and activation function as Leaky RELU and an output layer with activation function as tanh because a range of Image pixel is between -1 and +1.
# Defining Generator Model
latent_dim = 100
def build_generator(latent_dim):
i = Input(shape=(latent_dim,))
x = Dense(256, activation=LeakyReLU(alpha=0.2))(i)
x = BatchNormalization(momentum=0.7)(x)
x = Dense(512, activation=LeakyReLU(alpha=0.2))(x)
x = BatchNormalization(momentum=0.7)(x)
x = Dense(1024, activation=LeakyReLU(alpha=0.2))(x)
x = BatchNormalization(momentum=0.7)(x)
x = Dense(D, activation='tanh')(x) #because Image pixel is between -1 to 1.
model = Model(i, x) #i is input x is output layer
return model
Why do we use Leaky RELU?
Leaky relu helps the Gradient flow easily through the neural network architecture.
Why Batch Normalization?
It has the effect of stabilizing the training process by standardizing activations from the prior layer to have zero mean and unit variance. Batch Normalization has become a staple while training deep convolutional Networks and GANs are no different from it.
Applying batch norm directly to all layers resulted in sample oscillation and model instability.
Here we develop a simple Feed Forward Neural network for Discriminator where we will pass an image size. The activation function used is Leaky ReLU and you know the reason for it and sigmoid is used in the output layer for binary classification problems to classify Images as real or Fake.
def build_discriminator(img_size):
i = Input(shape=(img_size,))
x = Dense(512, activation=LeakyReLU(alpha=0.2))(i)
x = Dense(256, activation=LeakyReLU(alpha=0.2))(x)
x = Dense(1, activation='sigmoid')(x)
model = Model(i, x)
return model
Now it’s time to compile both the defined components of GAN Architecture
# Build and compile the discriminator
discriminator = build_discriminator(D)
discriminator.compile ( loss='binary_crossentropy', optimizer=Adam(0.0002, 0.5), metrics=['accuracy'])
# Build and compile the combined model
generator = build_generator(latent_dim)
Now we will create input to represent noise samples from latent space. And we pass this noise to a generator to generate an Image. After this, we pass the generator Image to Discriminator and predict that it is Fake or real. In the initial phase, we do not want the discriminator to be trained and the image is Fake.
## Create an input to represent noise sample from latent space
z = Input(shape=(latent_dim,))
## Pass noise through a generator to get an Image
img = generator(z)
discriminator.trainable = False
fake_pred = discriminator(img)
It’s time to create a combined Generator model with noise input and feedback of discriminator that helps the generator to improve its performance.
combined_model_gen = Model(z, fake_pred) #first is noise and 2nd is fake prediction
# Compile the combined model
combined_model_gen.compile(loss='binary_crossentropy', optimizer=Adam(0.0002, 0.5))
Define epochs, batch size, and a sample period which means after how many steps the generator will create a sample. After this, we define the Batch labels as one and zero. One represents that image is real and zero represents the image is fake. And we also create two empty lists to store the loss of generator and discriminator. And very importantly we create an empty file in the working directory where the generated image through the generator will be saved.
batch_size = 32
epochs = 12000
sample_period = 200
ones = np.ones(batch_size)
zeros = np.zeros(batch_size)
#store generator and discriminator loss in each step or each epoch
d_losses = []
g_losses = []
#create a file in which generator will create and save images
if not os.path.exists('gan_images'):
os.makedirs('gan_images')
Create a function that generates a grid of random samples from a generator and saves them to a file. In simple words, it will create random images on some epochs. We define the row size as 5 and column as also 5 so in a single iteration or on a single page it will generate 25 images.
def sample_images(epoch):
rows, cols = 5, 5
noise = np.random.randn(rows * cols, latent_dim)
imgs = generator.predict(noise)
# Rescale images 0 - 1
imgs = 0.5 * imgs + 0.5
fig, axs = plt.subplots(rows, cols) #fig to plot img and axis to store
idx = 0
for i in range(rows): #5*5 loop means on page 25 imgs will be there
for j in range(cols):
axs[i,j].imshow(imgs[idx].reshape(H, W), cmap='gray')
axs[i,j].axis('off')
idx += 1
fig.savefig("gan_images/%d.png" % epoch)
plt.close()
Now let’s start the training Discriminator. We have to pass real images means MNIST dataset as well some Fake Images to Discriminator to train it well that it is capable to classify images. After this, we create a random noise grid the same as of real image and pass it to a generator to generate a new image. After this, we calculate the loss of both models and in a generated image, we pass the label as one to fool the Discriminator to believe and check that it is capable to identify it as Fake or not.
#FIRST we will train Discriminator(with real imgs and fake imgs)
# Main training loop
for epoch in range(epochs):
###########################
### Train discriminator ###
###########################
# Select a random batch of images
idx = np.random.randint(0, x_train.shape[0], batch_size)
real_imgs = x_train[idx] #MNIST dataset
# Generate fake images
noise = np.random.randn(batch_size, latent_dim) #generator to generate fake imgs
fake_imgs = generator.predict(noise)
# Train the discriminator
# both loss and accuracy are returned
d_loss_real, d_acc_real = discriminator.train_on_batch(real_imgs, ones) #belong to positive class(real imgs)
d_loss_fake, d_acc_fake = discriminator.train_on_batch(fake_imgs, zeros) #fake imgs
d_loss = 0.5 * (d_loss_real + d_loss_fake)
d_acc = 0.5 * (d_acc_real + d_acc_fake)
#######################
### Train generator ###
#######################
noise = np.random.randn(batch_size, latent_dim)
g_loss = combined_model_gen.train_on_batch(noise, ones)
#Now we are trying to fool the discriminator that generate imgs are real that's why we are providing label as 1
# do it again!
noise = np.random.randn(batch_size, latent_dim)
g_loss = combined_model_gen.train_on_batch(noise, ones)
# Save the losses
d_losses.append(d_loss) #save the loss at each epoch
g_losses.append(g_loss)
if epoch % 100 == 0:
print(f"epoch: {epoch+1}/{epochs}, d_loss: {d_loss:.2f},
d_acc: {d_acc:.2f}, g_loss: {g_loss:.2f}")
if epoch % sample_period == 0:
sample_images(epoch)
We trained it on 12000 epochs, you can train on more epochs. It will take some time so better to use Kaggle or Google Colab GPU. And generated images will be saved with the name gan image followed by an epoch number in the defined directory.
We have finished the training of GAN and let’s see what accuracy the Generator is capable of to make Discriminator Fool.
plt.plot(g_losses, label='g_losses')
plt.plot(d_losses, label='d_losses')
plt.legend()
Let’s plot the generated images at different epochs to see that after how many epochs the generator was capable to extract some information.
Plot the generated Image at zero epoch
from skimage.io import imread
a = imread('gan_images/0.png')
plt.imshow(a)
let’s see at the initial epoch what results in it being generated.
The discriminator intelligently identifies fake information without extracting any from the generator.
Plot Image Generated after training on 1000 epoch
from skimage.io import imread
a = imread('gan_images/10000.png')
plt.imshow(a)
Now Generator is slowly being capable to extract some information that can be observed.
Plot Image Generated after training on 10000 Epochs
Now Generator is capable to build as it is an image as of MNIST dataset and there are high chances of the Discriminator being Fool.
Generative Adversarial Networks (GANs) represent a powerful paradigm in the field of machine learning, offering diverse applications and functionalities. This analysis of the table of contents highlights the comprehensive nature of GANs, covering their definition, applications, components, training methodologies, loss functions, challenges, variations, implementation steps, and practical demonstrations. GANs have demonstrated remarkable capabilities in generating realistic data, enhancing image processing, and facilitating creative applications. Despite their effectiveness, challenges such as mode collapse and training instability persist, necessitating ongoing research efforts. Nevertheless, with proper understanding and implementation, GANs hold immense potential to revolutionize various domains, as exemplified by their practical utilization on datasets like MNIST.
A. A generative adversarial network (GAN) is a type of artificial intelligence model composed of two neural networks, the generator and the discriminator, which compete against each other in a game-like fashion. The generator creates new data samples that resemble real data, while the discriminator tries to distinguish between real and generated data.
A. The purpose of GAN is to generate realistic data samples, such as images, text, or audio, that are indistinguishable from real data. It aims to improve the quality of generated data over time through adversarial training between the generator and discriminator networks.
A. One example of a GAN is the creation of photorealistic images of human faces. Given a dataset of human faces, a GAN can generate new images of faces that look convincingly real, even though they are entirely generated by the model.
A. CNN stands for Convolutional Neural Network, a type of neural network commonly used in tasks involving image processing and classification. GAN, on the other hand, stands for Generative Adversarial Network, a type of model used for generating new data samples. While CNN focuses on analyzing and classifying data, GAN focuses on generating new data samples that resemble real data.
very informative sir thank you
Very clearly explained, thank you