A Brief Overview of Recurrent Neural Networks (RNN)

Debasish Kalita 29 May, 2024 • 9 min read

Introduction

Ever wonder how chatbots understand your questions or how apps like Siri and voice search can decipher your spoken requests? The secret weapon behind these impressive feats is a type of artificial intelligence called Recurrent Neural Networks (RNNs).

Recurrent Neural networks

Unlike standard neural networks that excel at tasks like image recognition, RNNs boast a unique superpower – memory! This internal memory allows them to analyze sequential data, where the order of information is crucial. Imagine having a conversation – you need to remember what was said earlier to understand the current flow. Similarly, RNNs can analyze sequences like speech or text, making them perfect for tasks like machine translation and voice recognition. Although RNNs have been around since the 1980s, recent advancements like Long Short-Term Memory (LSTM) and the explosion of big data have unleashed their true potential.

This article was published as a part of the Data Science Blogathon.

What is a Recurrent Neural Network (RNN)?

Neural networks imitate the function of the human brain in the fields of Data science, Artificial intelligence, machine learning, and deep learning, allowing computer programs to recognize patterns and solve common issues.

RNNs are a type of neural network that can be used to model sequence data. RNNs, which are formed from feedforward networks, are similar to human brains in their behaviour. Simply said, recurrent neural networks can anticipate sequential data in a way that other algorithms can’t.

Recurrent Neural Networks

All of the inputs and outputs in standard neural networks are independent of one another, however in some circumstances, such as when predicting the next word of a phrase, the prior words are necessary, and so the previous words must be remembered. As a result, RNN was created, which used a Hidden Layer to overcome the problem. The most important component of RNN is the Hidden state, which remembers specific information about a sequence.

RNNs have a Memory that stores all information about the calculations. It employs the same settings for each input since it produces the same outcome by performing the same task on all inputs or hidden layers.

Also Read: Introduction to Autoencoders | Encoders and decoders for Data Science Enthusiasts

What Makes RNN Special?

Recurrent neural networks (RNNs) set themselves apart from other neural networks with their unique capabilities:

  • Internal Memory: This is the key feature of RNNs. It allows them to remember past inputs and use that context when processing new information.
  • Sequential Data Processing: Because of their memory, RNNs are exceptional at handling sequential data where the order of elements matters. This makes them ideal for tasks like speech recognition, machine translation, natural language processing(nlp) and text generation.
  • Contextual Understanding: RNNs can analyze the current input in relation to what they’ve “seen” before. This contextual understanding is crucial for tasks where meaning depends on prior information.
  • Dynamic Processing: RNNs can continuously update their internal memory as they process new data. This allows them to adapt to changing patterns within a sequence.

Also Read: Deep Learning for Computer Vision – Introduction to Convolution Neural Networks

RNN Architecture

RNNs are a type of neural network that has hidden states and allows past outputs to be used as inputs. They usually go like this:

Here’s a breakdown of its key components:

  • Input Layer: This layer receives the initial element of the sequence data. For example, in a sentence, it might receive the first word as a vector representation.
  • Hidden Layer: The heart of the RNN, the hidden layer contains a set of interconnected neurons. Each neuron processes the current input along with the information from the previous hidden layer’s state. This “state” captures the network’s memory of past inputs, allowing it to understand the current element in context.
  • Activation Function: This function introduces non-linearity into the network, enabling it to learn complex patterns. It transforms the combined input from the current input layer and the previous hidden layer state before passing it on.
  • Output Layer: The output layer generates the network’s prediction based on the processed information. In a language model, it might predict the next word in the sequence.
  • Recurrent Connection: A key distinction of RNNs is the recurrent connection within the hidden layer. This connection allows the network to pass the hidden state information (the network’s memory) to the next time step. It’s like passing a baton in a relay race, carrying information about previous inputs forward

The Architecture of a Traditional RNN

RNNs are a type of neural network that has hidden states and allows past outputs to be used as inputs. They usually go like this:

Recurrent Neural Networks

 

Source: Standford.edu
Recurrent Neural Networks
Recurrent Neural Networks

 

Source: Standford.edu

RNN architecture can vary depending on the problem you’re trying to solve. From those with a single input and output to those with many (with variations between).

Below are some examples of RNN architectures that can help you better understand this.

  • One To One: There is only one pair here. A one-to-one architecture is used in traditional neural networks.
  • One To Many: A single input in a one-to-many network might result in numerous outputs. One too many networks are used in the production of music, for example.
  • Many To One:  In this scenario, a single output is produced by combining many inputs from distinct time steps. Sentiment analysis and emotion identification use such networks, in which the class label is determined by a sequence of words.
  • Many To Many: For many to many, there are numerous options. Two inputs yield three outputs. Machine translation systems, such as English to French or vice versa translation systems, use many to many networks.

How does Recurrent Neural Networks work?

The information in recurrent neural networks cycles through a loop to the middle hidden layer.

recurrent neural networks

The input layer x receives and processes the neural network’s input before passing it on to the middle layer.

Multiple hidden layers can be found in the middle layer h, each with its own activation functions, weights, and biases. You can utilize a recurrent neural network if the various parameters of different hidden layers are not impacted by the preceding layer, i.e. There is no memory in the neural network.

The different activation functions, weights, and biases will be standardized by the Recurrent Neural Network, ensuring that each hidden layer has the same characteristics. Rather than constructing numerous hidden layers, it will create only one and loop over it as many times as necessary.

Common Activation Functions

A neuron’s activation function dictates whether it should be turned on or off. Nonlinear functions usually transform a neuron’s output to a number between 0 and 1 or -1 and 1.

recurrent neural networks

The following are some of the most commonly utilized functions:

  • Sigmoid: The formula g(z) = 1/(1 + e^-z) is used to express this.
  • Tanh: The formula g(z) = (e^-z – e^-z)/(e^-z + e^-z) is used to express this.
  • Relu: The formula g(z) = max(0 , z) is used to express this.

Advantages and disadvantages of RNN

Advantages of RNNs:

  • Handle sequential data effectively, including text, speech, and time series.
  • Process inputs of any length, unlike feedforward neural networks.
  • Share weights across time steps, enhancing training efficiency.

Disadvantages of RNNs:

  • Prone to vanishing and exploding gradient problems, hindering learning.
  • Training can be challenging, especially for long sequences.
  • Computationally slower than other neural network architectures.

Recurrent Neural Network Vs Feedforward Neural Network

A feed-forward neural network has only one route of information flow: from the input layer to the output layer, passing through the hidden layers. The data flows across the network in a straight route, never going through the same node twice.

The information flow between an RNN and a feed-forward neural network is depicted in the two figures below.

Recurrent Neural network

Feed-forward neural networks are poor predictions of what will happen next because they have no memory of the information they receive. Because it simply analyses the current input, a feed-forward network has no idea of temporal order. Apart from its training, it has no memory of what transpired in the past.

The information is in an RNN cycle via a loop. Before making a judgment, it evaluates the current input as well as what it has learned from past inputs. A recurrent neural network, on the other hand, may recall due to internal memory. It produces output, copies it, and then returns it to the network.

Backpropagation Through Time (BPTT)

When we apply a Backpropagation algorithm to a Recurrent Neural Network with time series data as its input, we call it backpropagation through time.

A single input is sent into the network at a time in a normal RNN, and a single output is obtained. Backpropagation, on the other hand, uses both the current and prior inputs as input. This is referred to as a timestep, and one timestep will consist of multiple time series data points entering the RNN at the same time.

Backpropagation Through Time (BPTT)

 

Source: Medium.com

The output of the neural network is used to calculate and collect the errors once it has trained on a time set and given you an output. The network is then rolled back up, and weights are recalculated and adjusted to account for the faults.

Two issues of Standard RNNs

There are two key challenges that RNNs have had to overcome, but in order to comprehend them, one must first grasp what a gradient is.

Standard RNNs

 

Source: GreatLearning.com

With regard to its inputs, a gradient is a partial derivative. If you’re not sure what that implies, consider this: a gradient quantifies how much the output of a function varies when the inputs are changed slightly.

A function’s slope is also known as its gradient. The steeper the slope, the faster a model can learn, the higher the gradient. The model, on the other hand, will stop learning if the slope is zero. A gradient is used to measure the change in all weights in relation to the change in error.

  • Exploding Gradients: Exploding gradients occur when the algorithm gives the weights an absurdly high priority for no apparent reason. Fortunately, truncating or squashing the gradients is a simple solution to this problem.
  • Vanishing Gradients: Vanishing gradients occur when the gradient values are too small, causing the model to stop learning or take far too long. This was a big issue in the 1990s, and it was far more difficult to address than the exploding gradients. Fortunately, Sepp Hochreiter and Juergen Schmidhuber’s LSTM concept solved the problem.

RNN Applications

Recurrent Neural Networks are used to tackle a variety of problems involving sequence data. There are many different types of sequence data, but the following are the most common: Audio, Text, Video, Biological sequences.

Using RNN models and sequence datasets, you may tackle a variety of problems, including :

  • Speech recognition
  • Generation of music
  • Automated Translations
  • Analysis of video action
  • Sequence study of the genome and DNA

Basic Python Implementation (RNN with Keras)

Import the required libraries

import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

Here’s a simple Sequential model that processes integer sequences, embeds each integer into a 64-dimensional vector, and then uses an LSTM layer to handle the sequence of vectors.

model = keras.Sequential()
model.add(layers.Embedding(input_dim=1000, output_dim=64))
model.add(layers.LSTM(128))
model.add(layers.Dense(10))
model.summary()

Output:

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding (Embedding)        (None, None, 64)          64000     
_________________________________________________________________
lstm (LSTM)                  (None, 128)               98816     
_________________________________________________________________
dense (Dense)                (None, 10)                1290      
=================================================================
Total params: 164,106
Trainable params: 164,106
Non-trainable params: 0

Conclusion

  • Recurrent Neural Networks are a versatile tool that can be used in a variety of situations. They’re employed in a variety of methods for language modelling and text generators. They’re also employed in voice recognition.
  • This type of neural network is used to create labels for images that aren’t tagged when paired with Convolutional Neural Networks. It’s incredible how well this combination works.
  • However, there is one flaw with recurrent neural networks. They have trouble learning long-range dependencies, which means they don’t comprehend relationships between data that are separated by several steps.
  • When anticipating words, for example, we may require more context than simply one prior word. This is known as the vanishing gradient problem, and it is solved using a special type of Recurrent Neural Network called Long-Short Term Memory Networks (LSTM), which is a larger topic that will be discussed in future articles.

Frequently Asked Questions

Q1. What are recurrent neural networks?

A. Recurrent Neural Networks (RNNs) are a type of artificial neural network designed to process sequential data, such as time series or natural language. They have feedback connections that allow them to retain information from previous time steps, enabling them to capture temporal dependencies. This makes RNNs well-suited for tasks like language modeling, speech recognition, and sequential data analysis.

Q2. How does a recurrent neural network work?

A. A recurrent neural network (RNN) works by processing sequential data step-by-step. It maintains a hidden state that acts as a memory, which is updated at each time step using the input data and the previous hidden state. The hidden state allows the network to capture information from past inputs, making it suitable for sequential tasks. RNNs use the same set of weights across all time steps, allowing them to share information throughout the sequence. However, traditional RNNs suffer from vanishing and exploding gradient problems, which can hinder their ability to capture long-term dependencies.

Read more articles on our blog. Head onto it now!

The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion. 

Debasish Kalita 29 May 2024

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers

Clear

Akanji wasiu
Akanji wasiu 27 Apr, 2023

I want to present a seminar paper on Optimization of deep learning-based models for vulnerability detection in digital transactions. I need assistance.

Related Courses

Natural Language Processing
Become a full stack data scientist