MNIST Image Reconstruction Using an Autoencoder

ANURAG SINGH CHOUDHARY 20 Jul, 2023

8 min read

Introduction

With so much information on the Internet, researchers and scientists are trying to develop more efficient and secure data transfer methods. Autoencoders have emerged as valuable tools for this purpose due to their simple and intuitive architecture. Usually, after the autoencoder is trained, the encoder weights can be sent to the sender, and the decoder weights to the receiver. This allows the sender to send data in an encoded format, saving time and cost, while the receiver can receive compressed data. This article explores the exciting application of autoencoders in MNIST image reconstruction, especially using the MNIST numerical database and the PyTorch framework in Python.

Learning Objectives

This article focuses on building a TensorFlow Autoencoder capable of encoding MNIST images.
We will implement functions to load and process databases and create dynamic transformations of data points.
Encoder-Decoder Architecture Autoencoder will be generated using noisy and real images as input.
Explore the importance of autoencoders in deep learning, their application principles, and their potential to improve model performance.

This article was published as a part of the Data Science Blogathon.

The Architecture of Autoencoders

Autoencoders can be divided into three main components:

Encoder: this module takes the input data from the train-validation-test set and compresses it into an encoded representation. Typically, the coded image data is smaller than the input data.

Bottleneck: the bottleneck module keeps the knowledge representation compressed and makes it a critical part of the network. The data dimension becomes a decreasing barrier.

Decoder: The decoder module is crucial in restoring the data representation to its original form by “decompressing” it. The resulting output from the decoder is then compared to either the ground truth or the initial input data.

The decoder module assists in “decompressing” the data display and reconstructing it in its encoded form. The output of the decoder is then equated with the ground truth or the original input data.

The Relationship Among the Encoder, Bottleneck, and Decoder

Encoder

The encoder plays a significant character in compressing input data through the pooling module and convolutional block. This compression produces a compact image called a block.

After a delay, the decoder plays. It consists of high-level modules that return features compressed to the original image format. In the basic autoencoders, the decoder aims to reconstruct the output similar to the input regardless of noise reduction.MNIST Image Reconstruction Using an Autoencoder

However, in the case of variable autoencoders, the input is not a reconstruction of the input. Instead, it creates an entirely new image based on the input data given to the model. This difference allows variable autoencoders to have some control over the resulting image and produce different results.

Bottleneck

Although the bottleneck is the smallest part of the nervous system, it is very important. It acts as a critical element that limits data flow from the encoder to the decoder, allowing only the most critical data to pass through. By limiting the flow, the barrier ensures that crucial properties are preserved and used in recovery.

This represents the type of input knowledge by designing obstacles to extract maximum information from the image. The encoder-decoder structure enables the extraction of valuable information from images and the creation of meaningful connections between various inputs in the network.

This compressed form of processing prevents the nervous system from memorizing input and information overload. As a general guideline, the smaller the barrier, the lower the excess risk.

However, very small buffers can limit the amount of data stored, increasing the likelihood that essential data will be lost through the encoder’s pool layer.

Decoder

A decoder consists of an uplink and convolution block reconstructing output interrupts.

Once the input reaches the decoder that receives the compressed representation, it becomes a “decompressor”. The role of the decoder is to reconstruct the image based on the hidden properties extracted from the compressed image. By using this hidden property, the decoder effectively reconstructs the image by reversing the compression process done by the encoder.

How to Train Autoencoders?

Before setting up the autoencoder, there are four important hyperparameters:

Code size: Code size, also known as block size, is an essential hyperparameter in autoencoder tuning. Specifies the data compression level. Additionally, the size of the code can act as a regularization term.
Several layers: Like other neural networks, encoder, and decoder depth is a vital autoencoder hyperparameter. Increasing the depth adds complexity to the model while decreasing the depth increases processing speed.
Number of points in each layer: The number of points in each layer determines the weight used in each layer. Typically, the number of points decreases as we go through the next layer in the autoencoder, indicating that the input is decreasing.
Loss Recovery: The choice of the loss function to train the autoencoder depends on the desired input-output adaptation. When working with image data, popular loss functions for reconstruction include mean square error (MSE) loss and L1 loss. Binary Cross Entropy can also be used as a reconstruction loss if the inputs and outputs are in the range [0,1], for example, with MNIST.

Requirements

We need this library and helper functions to create an Autoencoder in Tensorflow.

Tensorflow: To begin, we should import the Tensorflow library and all the necessary components for creating our model, enabling it to read and generate MNIST images.

NumPy: Next, we import numpy, a powerful library for processing numbers, which we will use for preprocessing and reorganizing the database.

Matplotlib: We will use the matplotlib planning library to visualize and evaluate the model’s performance.

The data_proc(dat) function takes the helper function as data and resizes it to the size required by the model.
The gen_noise(dat) helper function is designed to accept an array as input, apply Gaussian noise, and guarantee that the resulting values fall within the range of (0,1).
Two Arrays is a display helper function (dat1, dat2) that takes an input array and an array of predicted images and puts them into two rows.

Building the AutoEncoder

In the next part, we will learn how to create a simple Autoencoder using TensorFlow and train it using MNIST images. First, we will outline the steps to load and process MNIST data to meet our requirements. Once the data is properly formatted, we build and train the model.

The network architecture consists of three main components: Encoder, Bottleneck, and Decoder. The encoder is responsible for compressing the input image while preserving valuable information. bottleneck determines which features are essential to go through the decoder. Finally, the Decoder uses the Bottleneck result to reconstruct the image. During this reconstruction process, the Autoencoder aims to learn the hidden location of the data.

We must import some libraries and write some functions to create a model to read and create MNIST images. Use the TensorFlow library to import it with other related components. Also, import NumPy numerical processing library and Matplotlib plotting library. This library will help us perform some operations and visualize the results.

Import Library

import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt

from tensorflow.keras.layers import *
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Model

In addition, we need the implementation of some auxiliary functions. The initialization function is responsible for receiving an array as input and changing the size to the required size for the model.

def data_proc(dat):
    larr = len(dat)
    return np.reshape(dat.astype("float32") /255.0 , (larr, 28,28,1))

We must also add a second helper function that operates on the Array. This function adds Gaussian noise to the array and ensures that the resulting value is between 0 and 1.

def gen_noise(dat):
    return np.clip(dat + 0.4 * np.random.normal(loc=0.0, scale=1.0, size=dat.shape), 0.0, 1.0)

Evaluate the Performance of Model

To evaluate the performance of our model, it is important to visualize a large number of images. For this purpose, we can use an input function that takes two arrays, a set of projected images, and a third function that puts them into two rows.

def display(dat1, dat2):
    ind = np.random.randint(len(dat1), size=10)
    im1 = dat1[ind, :]
    im2 = dat2[ind, :]
    for i, (a, b) in enumerate(zip(im1, im2)):
        plt_axis = plt.subplot(2, n, i + 1)
        plt.imshow(a.reshape(28, 28))
        plt.gray()
        plt_axis.get_xaxis().set_visible(False)
        plt_axis.get_yaxis().set_visible(False)
        
        plt_axis = plt.subplot(2, n, i + 1 + n)
        plt.imshow(b.reshape(28, 28))
        plt.gray()
        plt_axis.get_xaxis().set_visible(False)
        plt_axis.get_yaxis().set_visible(False)
    plt.show()

Dataset Preparation

The MNIST dataset has been provided in TensorFlow, divided into training and test datasets. We can load this database directly and use the default processing functions defined earlier. Additionally, we generate a noisy version of the original MNIST image for the second half of the input data using the gen_noise function we defined earlier. It should be noted that the input noise level affects image distortion, making it difficult to perform well in model reconstruction. We will imagine the original image and noise as part of the process.

(ds_train, _), (ds_test, _) = mnist.load_data()
ds_train,ds_test = data_proc(ds_train), data_proc(ds_test)
noisy_ds_train, noisy_ds_test = gen_noise(ds_train), gen_noise(ds_test)
display(ds_train, noisy_ds_train)

Encoder Definition

The encoder part of the network uses Convolutional and Max Pooling layers with ReLU activation. The goal is to cool the input data before sending it over the network. The desired output from this step is a compressed version of the original data. Given that the MNIST image has a 28x28x1 image, we create an input with a certain shape.

inps = Input(shape=(28, 28, 1))


x = Conv2D(32, (3, 3), activation="relu", padding="same")(inps)
x = MaxPooling2D((2, 2), padding="same")(x)
x = Conv2D(32, (3, 3), activation="relu", padding="same")(x)
x = MaxPooling2D((2, 2), padding="same")(x)

Bottleneck Definition

In contrast to other elements, the Bottleneck does not necessitate explicit programming. As the MaxPooling Encoder layer yields a highly condensed final output, the Decoder is trained to reconstruct the image utilizing this compressed representation. The architecture of the Bottleneck can be modified in a more intricate Autoencoder implementation.

Decoder Definition

The Decoder consists of Transposed Convolutions with a stride of 2. The last layer of the model utilizes a simple 2D convolution with the sigmoid activation function. The purpose of this component is to reconstruct images from the compressed representation. The Transposed Convolution is employed for upsampling, allowing for larger strides and reducing the number of steps required to upsample the images.

x = Conv2DTranspose(32, (3, 3),activation="relu", padding="same", strides=2)(x)
x = Conv2DTranspose(32, (3, 3),activation="relu", padding="same", strides=2)(x)
x = Conv2D(1, (3, 3), activation="sigmoid", padding="same")(x)

Model Training

After defining the model, it must be configured with the optimizer and loss functions. In this article, we will use the Adam Optimizer and select the Binary Cross Entropy Loss function for training.


conv_autoenc_model = Model(inps, x)
conv_autoenc_model.compile(optimizer="adam", loss="binary_crossentropy")
conv_autoenc_model.summary()

Output

Once the model is built, we can train it using the modified MNIST images created earlier in the article. The training process involves running the model for 50 epochs with a batch size of 128. In addition, we provide validation data for the model.

conv_autoenc_model.fit(
    x=ds_train,
    y=ds_train,
    epochs=50,
    batch_size=128,
    shuffle=True,
    validation_data=(ds_test, ds_test),
)

Reconstructing Images

Once we train the model, we can generate predictions and reconstruct images. We can use the previously defined function to display the resulting image.

preds = conv_autoenc_model.predict(ds_test)
display(ds_test, preds)

Conclusion

An autoencoder is an artificial neural network that you can use to learn unsupervised data encoding. The main goal is to obtain a low-dimensional representation, often called encoding, for high-dimensional data to reduce the dimension. Grids enable efficient data representation and analysis to capture the input image’s most important features or characteristics.

Key Takeaways

Autoencoders are unsupervised learning techniques used in neural networks. Design it to learn efficient data representation (encoding) by training the network to filter unwanted signal noise.
Autoencoders have a variety of applications, including imaging, image compression, and in some cases, even image generation.
Although autoencoders seem straightforward at first glance due to their simple theoretical basis, teaching them to learn meaningful representations of input data can be challenging.
Autoencoders have several applications, such as principal component analysis (PCA), a dimensionality reduction technique, image rendering, and many other tasks.

Frequently Asked Questions

Q1. What are Autoencoders?

Answer: Autoencoder is a technique that encodes data automatically. It develops neural networks to learn how to divide data, especially images, into compact images. Using this encoded representation, the autoencoder tries to reconstruct the original data as faithfully as possible.

Q2. When should we not use autoencoders?

Answer: Autocoders may introduce input errors or limitations in key relationship variables that differ from those in the training set, which may result in inaccurate data. Additionally, there is a risk of removing important information from the input data during the compression and reconstruction process.

Q3. Is autoencoder better than PCA?

Answer: When we compare the performance of autoencoders and PCA (Principal Component Analysis) for dimension reduction, we perform a performance evaluation using the extensive MNIST database. In this scenario, the autoencoder model performs better than the PCA model. This result can be attributed to the size and non-linear nature of the MNIST database, which is better suited to the capabilities of the auto-encoder.

Q4. Explain the limitations of autoencoders.

Answer: Autoencoders are very sensitive to input errors and can outperform manual approaches. Furthermore, there is probably no significant advantage to using an autoencoder under time constraints regarding output and speed. The complexity associated with implementing an autoencoder adds a layer of complexity and control that may not be necessary in some situations.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.