A visual guide to Recurrent Neural Networks

Bai Last Updated : 26 Jun, 2021

5 min read

This article was published as a part of the Data Science Blogathon

Have you ever used ‘Google translate’ or ‘Grammarly’ or while typing in Gmail have you ever wondered how does it knows what word I want to type so perfectly? The answer is using a recurrent neural network (RNN), well to be precise a modification of RNN. The entire list of use cases of RNN can be an article on itself and easily found on the internet. I will try in this article (and articles following this) to give you an intuition behind the inner workings of Recurrent Neural Networks.

Prerequisites:

Basic understanding of Neural Networks
Awareness of basic terminologies of NLP like corpus, one hot coding, etc…

What is a Recurrent Neural Network (RNN)?

RNN’s are a variety of neural networks that are designed to work on sequential data. Data, where the order or the sequence of data is important, can be called sequential data. Text, Speech, and time-series data are few examples of sequential data.

What is the need for RNN when we have simple neural networks?

We will understand the need with the help of an example. Suppose we have few reviews of restaurants and our task is to predict whether the review is positive or negative. To feed data into any neural network we first have to represent the text in machine-understandable form.

Let us divide the sentence into separate tokens by splitting them by white space character and one hot encode each token.

_{Source: Author}

Sentence 1: ‘Delightful place to have dinner’

For sentence 1 we see Tx (Total number of words or tokens in the sentence)=5.

_{Source: Author}

Sentence 2: ‘Food was nice but service wasn’t’

For sentence 2 we can see Tx=6. But the input layer in architecture for our DNN is fixed. We can see that there is no direct way to feed data into the network.

Okay, you might ask why don’t we convert each sentence to be length equal to that of the sentence with maximum length by adding zeros. That might solve the problem of varying lengths of input but another problem occurs. Our number of parameters becomes astronomically high.

Suppose the maximum length of the sentence is 20 which pretty small for most data sets available.

Let the number of words in corpus= 20k (considering a very small corpus)

Then each input will become 400k dimensional and with just 10 neurons in the hidden layer, our number of parameters becomes 4 million! To have a proper network means having billions of parameters. To overcome this we need to have a network with weight sharing capabilities.

Now we know the need for a new variety of architecture. Let’s see how RNN solves this problem.

Visualizing the RNN

_{Source: Author}

Recurrent means repeating and the idea in RNN is to have layers that repeat over a period of time. In RNN given a sentence, we first tokenize and one hot encode each word of a sentence. Then we input each token to RNN over time.

_{Source: Author}

We can see in the above image where RNN being unrolled over time. Unrolling means we are using the same RNN block at each point of time and the output of the previous time step becomes the input for the current time step. Let us visualize what an RNN block for a single time step looks like…

_{Source: Author}

Hidden state stores information about all the previous inputs in a weighted manner. Most recent inputs get higher values and older inputs get lower values. The hidden state of the previous time step gets concatenated with the input of the current time step and is fed into the tanh activation. The tanh activation scales all the values between -1 to 1 and this becomes the hidden state of the current time step. Based on the type of RNN if we want to predict output at each step, this hidden state is fed into a softmax layer and we get the output for the current time step. The current hidden state becomes the input to the RNN block of the next time step. This process goes on until the end of the sentence.

This process goes on until the end of the sentence.

_{Source: Author}

Now we have a prediction at each time step. We compare it with actual labels and compute the loss for each time step. This concludes a single forward propagation step in an RNN.

The backpropagation of an RNN occurs in the exact reverse manner of forwarding propagation.

_{Source: Author}

The crux of the idea is same weights are used in all the time steps hence the number of parameters is independent of input length and each of the weights gets updated during backpropagation through time.

I hope that gave you some idea of the inner workings of RNN’s. Let’s look at the various types of RNN along with their uses.

Types of RNN

_{Source: Author}

Many to many (Equal input-output): Named entity recognition, Part of speech recognition.

Many to one: Sentiment classification, Movie rating prediction

One to many: Novel sequence generation, Image captioning.

Many to many(Unequal input-output): Machine translation, Speech recognition.

Limitations of RNN

_{Source: imgflip.com}

Recurrent Neural Networks suffer from short-term memory. If a sequence is long enough, they’ll have a hard time carrying information from earlier time steps to later ones. So if you are trying to process a paragraph of text to do predictions, RNN’s may leave out important information from the beginning.

_{Source: Author}

The next article will see and understand how LSTM’s and GRU’s tackle these problems. Before I end this article here are few implementation details about building an RNN.

For the first RNN block, since we have no previous hidden state available, it is common to input a vector of zeros.
While inputting texts of varying lengths we pad each text to make each sequence of equal lengths.
To specify the start and end of a sentence we can and tokens.

I hope after reading this some doubts regarding RNN might have cleared. Do reach out in the comments if you still have any doubts.

You can connect with me at: https://www.linkedin.com/in/baivab-dash

Have a nice day.

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.

Bai

Advanced Deep Learning

Free Courses

4.8

Ensemble Learning and Ensemble Learning Techniques

Learn ensemble learning, its techniques, and how it works in this course!

4.9

Dimensionality Reduction for Machine Learning

Master key dimensionality reduction techniques for ML success!

Reading list

A visual guide to Recurrent Neural Networks

Prerequisites:

What is a Recurrent Neural Network (RNN)?

What is the need for RNN when we have simple neural networks?

Visualizing the RNN

Types of RNN

Limitations of RNN

Login to continue reading and enjoy expert-curated content.

Free Courses

Ensemble Learning and Ensemble Learning Techniques

Dimensionality Reduction for Machine Learning

Recommended Articles

Responses From Readers

Become an Author

Flagship Programs

Free Courses

Popular Categories

Generative AI Tools and Techniques

Popular GenAI Models

AI Development Frameworks

Data Science Tools and Techniques

Reading list

Introduction to NLP

Text Pre-processing

NLP Libraries

Regular Expressions

String Similarity

Spelling Correction

Topic Modeling

Text Representation

Information Retrieval System

Word Vectors

Word Senses

Dependency Parsing

Language Modeling

Getting Started with RNN

Different Variants of RNN

Machine Translation and Attention

Self Attention and Transformers

Transfomers and Pretraining

Question Answering

Text Summarization

Named Entity Recognition

Coreference Resolution

Audio Data

ASR

Audio Separation

Chatbot

Auto NLP

A visual guide to Recurrent Neural Networks

Prerequisites:

What is a Recurrent Neural Network (RNN)?

What is the need for RNN when we have simple neural networks?

Visualizing the RNN

Types of RNN

Limitations of RNN

Login to continue reading and enjoy expert-curated content.

Free Courses

Ensemble Learning and Ensemble Learning Techniques

Dimensionality Reduction for Machine Learning

Recommended Articles

Responses From Readers

Become an Author

Flagship Programs

Free Courses

Popular Categories

Generative AI Tools and Techniques

Popular GenAI Models

AI Development Frameworks

Data Science Tools and Techniques