Ever wondered how machines can recognize your face in photos or translate languages in real-time? It’s the magic of neural networks! These networks form the brain of any large language model (LLM), giving them the ability to find patterns, understand language, and think. In this article, we will learn all about neural networks and how they work. We will also explore some of the most popular neural networks in deep learning including RNNs, CNNs, ANNs, LSTMs, Transformers, and more. Towards the end, I’ll even tell you how deep learning is different from machine learning, and why the former is more popular. So, buckle up and get ready to explore the fascinating world of neural networks!
A neural network is a computational model inspired by the structure and functioning of the human brain. It consists of interconnected nodes, called neurons, organized in layers. Information is processed through these layers, with each neuron receiving inputs, applying a mathematical operation to them, and producing an output. Through a process called training, neural networks can learn to recognize patterns and relationships in data, making them powerful tools for tasks like image and speech recognition, natural language processing, and more.
Learn More: An Introductory Guide to Deep Learning and Neural Networks
Here’s a simplified explanation of how neural networks work:
In a typical dense network, each neuron in a layer is connected to every neuron in the adjacent layers. The neural network actively adjusts the weights associated with these connections during training to optimize its performance.
As mentioned earlier, each neuron applies an activation function, based on which the calculations are done. This function introduces non-linearity into the network, allowing it to learn complex patterns in the data.
Neural networks learn from data through a process called training. During training, the network is fed with input data along with the correct outputs (labels). It adjusts the weights of connections between neurons in order to minimize the difference between its predicted outputs and the true outputs. This process typically involves an optimization algorithm like gradient descent.
Once trained, the neural network can make predictions on new, unseen data by passing it through the network and obtaining the output from the final layer.
In essence, a neural network learns to recognize patterns in data by adjusting its internal parameters (weights) based on examples provided during training, allowing it to generalize and make predictions on new data.
Learn More: A Comprehensive Guide on Neural Networks
This article focuses on three important types of neural networks that form the basis for most pre-trained models in deep learning:
Let’s discuss each neural network in detail.
The perceptron is a fundamental type of neural network used for binary classification tasks. It consists of a single layer of artificial neurons (also known as perceptrons) that take input values, apply weights, and generate an output. A single perceptron (or neuron) can be imagined as a logistic regression. It performs a weighted sum of inputs, adds a bias, and passes the result through an activation function – just like logistic regression. When using a sigmoid activation, its output is a probability between 0 and 1, mimicking the behavior of logistic regression.
The perceptron is typically used for linearly separable data, where it learns to classify inputs into two categories based on a decision boundary. It finds applications in pattern recognition, image classification, and linear regression. However, the perceptron has limitations in handling complex data that is not linearly separable.
Learn More: Perceptron: Building Block of Artificial Neural Network
An MLP is a class of feedforward artificial neural network. It consists of at least three layers of nodes: an input layer, one or more hidden layers, and an output layer. Except for the input nodes, each node is a neuron that uses a nonlinear activation function.
Now, let us see how to overcome the limitations of MLP using two different architectures – Recurrent Neural Networks (RNN) and Convolution Neural Networks (CNN).
Learn More: A Simple overview of Multilayer Perceptron (MLP)
Feedforward Neural Network is the simplest type of ANN architecture. In this network, the information moves in only one direction – forward. As seen in the above image, it moves from the input nodes, through the hidden nodes (if any), and to the output nodes. There are no cycles or loops in the network.
Artificial Neural Network, or ANN, is a group of multiple perceptrons/neurons at each layer. It consists of 3 layers – Input, Hidden and Output. The input layer accepts the inputs, the hidden layer processes the inputs, and the output layer produces the result. Essentially, each layer tries to learn certain weights.
ANN can be used to solve problems related to tabular data, images, and textual data.
Learn More: Artificial Neural Networks – Better Understanding
Artificial Neural Network is capable of learning any nonlinear function. Hence, these networks are popularly known as Universal Function Approximators. ANNs have the capacity to learn weights that map any input to the output.
One of the main reasons behind universal approximation is the activation function. Activation functions introduce nonlinear properties to the network. This helps the network learn any complex relationship between input and output.
As you can see here, the output at each neuron is the activation of a weighted sum of inputs. But wait – what happens if there is no activation function? The network only learns the linear function and can never learn complex relationships. That’s why an activation function is the powerhouse of an ANN!
While solving an image classification problem using ANN, the first step is to convert a 2-dimensional image into a 1-dimensional vector prior to training the model. This has two drawbacks:
Yet another challenge occurs in the case of backward propagation. If there is a very deep neural network (network with a large number of hidden layers), the gradient vanishes or explodes as it propagates backward which leads to vanishing and exploding gradient.
Let us first try to understand the difference between an RNN and an ANN from the architecture perspective. Simply put, a looping constraint on the hidden layer of an ANN turns it into an RNN.
As you can see here, RNN has a recurrent connection on the hidden state. This looping constraint ensures that sequential information is captured in the input data.
Learn More: Fundamentals of Deep Learning – Introduction to Recurrent Neural Networks
We can use recurrent neural networks to solve the problems related to: time series data, textual content, and audio data.
As you can see in the image below, the output (o1, o2, o3, o4) at each time step depends not only on the current word but also on the previous words. RNNs share the parameters across different time steps. This is popularly known as Parameter Sharing. This results in fewer parameters to train and decreases the computational cost.
As shown in the above figure, 3 weight matrices – U, W, V, are the weight matrices that are shared across all the time steps.
As you can see here, the gradient computed at the last time step vanishes as it reaches the initial time step.
Learn More: Fundamentals of RNN forward Propagation in Deep Learning
LSTM networks are a type of recurrent neural network (RNN) designed to capture long-term dependencies in sequential data. Unlike traditional feedforward networks, LSTM networks have memory cells and gates that allow them to retain or forget information over time selectively. This makes LSTMs effective in speech recognition, natural language processing, time series analysis, and translation.
However, the challenge with LSTM networks lies in selecting the appropriate architecture and parameters and dealing with vanishing or exploding gradients during training.
Learn More: What is LSTM? Introduction to Long Short-Term Memory
Transformer networks have become one of the most important architectures in deep learning. They are especially useful in the field of NLP and machine translation. Introduced in the 2017 paper “Attention is All You Need” by Vaswani et al., the Transformer model revolutionized the way machines process sequences of data.
Learn More: Understanding Transformers: A Deep Dive into NLP’s Core Technology
Convolutional neural networks (CNN) are all the rage in the deep learning community right now. Various applications and domains use these CNN models, and they are especially prevalent in image and video processing projects.
The building blocks of CNNs are filters a.k.a. kernels. Kernels are used to extract the relevant features from the input using the convolution operation. Let’s try to grasp the importance of filters using images as input data. Convolving an image with filters results in a feature map:
Though convolutional neural networks were introduced to solve problems related to image data, they perform impressively on sequential inputs as well.
Learn More: Demystifying the Mathematics Behind Convolutional Neural Networks (CNNs)
Enrol in this free course on CNN to learn more about them: Convolutional Neural Networks from Scratch
Deconvolutional Neural Networks, also known as transposed convolutional networks or upconvolutional networks, are used to perform upsampling operations. Essentially, they reverse the process of convolution by transforming lower-resolution feature maps back into higher-resolution representations. This is particularly useful in tasks where it’s necessary to generate high-resolution data from a compressed version, such as in generative models.
An Autoencoder is a type of neural network used to learn efficient codings of unlabeled data. It consists of two parts: an encoder that compresses the input into a latent-space representation, and a decoder that reconstructs the input from this representation.
Learn More: An introduction to Autoencoders for Beginners
A Generative Adversarial Network consists of two neural networks, the generator and the discriminator, which contest with each other in a game. The generator creates fake data, while the discriminator evaluates its authenticity. This adversarial process continues until the generator produces data indistinguishable from real data.
Learn More: Generative Adversarial Networks (GANs): End-to-End Introduction
The RBF neural network is a feedforward neural network that uses radial basis functions as activation functions. RBF networks consist of multiple layers, including an input layer, one or more hidden layers with radial basis activation functions, and an output layer. RBF networks excel in pattern recognition, function approximation, and time series prediction.
Here, I have summarized some of the differences among different types of neural networks:
Neural Network Architecture | Data Type | Recurrent Connections | Parameter Sharing | Spatial Relationship | Vanishing & Exploding Gradient |
Single Layer Perceptron | Tabular data | No | No | No | Yes |
Multilayer Perceptrons (MLPs) | Tabular data | No | No | No | Yes |
Feedforward Neural Networks (FNNs) | Tabular data | No | No | No | Yes |
Artificial Neural Network (ANN) | Tabular data | No | No | No | Yes |
Recurrent Neural Network (RNN) | Sequence data (Time Series, Text, Audio) | Yes | Yes | No | Yes |
Long Short-Term Memory (LSTM) Networks | Sequence data (Time Series, Text, Audio) | Yes | Yes | No | Reduced |
Transformer Networks | Sequence data, Image, Text | No | Yes | Yes | Mitigated |
Convolution Neural Network (CNN) | Image data | No | Yes | Yes | Yes |
Deconvolutional Neural Networks | Image data | No | Yes | Yes | Yes |
Autoencoders | Tabular/Image/Text | No | Yes | Yes (CNN-based) | Yes |
Generative Adversarial Networks (GANs) | Image/Text/Audio generation | No | Yes | Yes | Yes |
Radial Basis Function (RBF) Neural Network | Tabular data | No | No | No | Yes |
It’s a pertinent question. There is no shortage of machine learning algorithms so why should a data scientist gravitate towards deep learning algorithms? What do neural networks offer that traditional machine learning algorithms don’t?
Another common question I see floating around – neural networks require a ton of computing power, so is it really worth using them? While that question is laced with nuance, here’s the short answer – yes!
The different types of neural networks in deep learning, such as convolutional neural networks (CNN), recurrent neural networks (RNN), artificial neural networks (ANN), etc. are changing the way we interact with the world. These different types of neural networks are at the core of the deep learning revolution, powering applications like unmanned aerial vehicles, self-driving cars, speech recognition, etc.
Although now we understand where and why deep learning is used, it’s natural to wonder – can’t machine learning algorithms do the same? Well, here are two key reasons why researchers and experts tend to prefer Deep Learning over Machine Learning:
Curious? Good – let me explain.
Every Machine Learning algorithm learns the mapping from an input to output. In case of parametric models, the algorithm learns a function with a few sets of weights:
Input -> f(w1,w2…..wn) -> Output
In the case of classification problems, the algorithm learns the function that separates 2 classes – this is known as a Decision boundary. A decision boundary helps us in determining whether a given data point belongs to a positive class or a negative class.
For example, in the case of logistic regression, the learning function is a Sigmoid function that tries to separate the 2 classes:
As you can see here, the logistic regression algorithm learns the linear decision boundary. It cannot learn decision boundaries for nonlinear data like this one:
Similarly, every Machine Learning algorithm is not capable of learning all the functions. This limits the problems these algorithms can solve that involve a complex relationship. Deep learning models can find patterns and are highly complex compared to Machine Learning models. As mentioned it is a universal approximation algorithm, it can take any decision boundary as required.
Feature engineering is a key step in the model building process. It is a two-step process:
In feature extraction, we extract all the required features for our problem statement and in feature selection, we select the important features that improve the performance of our machine learning or deep learning model.
Consider an image classification problem. Extracting features manually from an image needs strong knowledge of the subject as well as the domain. It is an extremely time-consuming process. Thanks to Deep Learning, we can automate the process of Feature Engineering!
In this article, I have discussed the importance of deep learning and the differences among different types of neural networks. I strongly believe that knowledge sharing is the ultimate form of learning. I am looking forward to hearing a few more differences! Hope you like the article and get to know about the types of neural networks and how its performing and what impact it’s creating.
good one. Refreshing the concepts in quick time . :) :) Thanks !
A great Brief on types of NNs precisely done. Splendid. I wish it were an ebook. Thanks
That is a good one Aravind. Helpful. Thanks