A Must-Read Introduction to Sequence Modelling (with use cases)

Tavish Last Updated : 31 May, 2020

8 min read

Introduction

Artificial Neural Networks (ANN) were supposed to replicate the architecture of the human brain, yet till about a decade ago, the only common feature between ANN and our brain was the nomenclature of their entities (for instance – neuron). These neural networks were almost useless as they had very low predictive power and less number of practical applications.

But thanks to the rapid advancement in technology in the last decade, we have seen the gap being bridged to the extent that these ANN architectures have become extremely useful across industries.

In this article, we will look at the two main advances in the field of artificial neural networks that have made these ANNs more like the human brain,

Two Main Advances in the Field of ANN
Thought Experiment
Practical Applications of Sequence Modelling
Sequence Generators
Sequence to Sequence NLP Models
Few More Sequence to Sequence Models that go beyond text

Two Main Advances in the Field of ANN

GPUs have immensely improved our computational power that now enables us to vastly increase the depth and breadth of neurons. However, we are still far away from reaching the number of neurons our brain has.
ANN can now process sequence data in both input and output nodes. This is how our brain works. Our brain does not solve binary classification to understand complex ideas. We formulate “Thoughts” based on a sequence of information given to us and then our brain expresses this “Thought” in understandable sequence of words.

Can we introduce this concept of “Thought” in an ANN? The answer is yes, and we will explore more about the idea in this article.

Sequence models have garnered a lot of attention because most of the data in the current world is in the form of sequences – it can be a number sequence, image pixel sequence, a video frame sequence or an audio sequence.

Over the last 10 years, we have stored 1000s of Petabytes (or more than 10 ^ 9 GBs) of unstructured sequence data for absolutely no reason as we had no way to fetch information out of such data formats. Luckily, we now have this new family of neural network architectures called sequence models that can turn this data dump into GOLD MINES.

The scope of this article is not to talk about all the complex mathematics that goes behind the scene in Sequence Modelling or give you some sample codes to run on sequence modelling (I will park that for some later articles), but to give you practical examples of sequence modelling implementations in the industry. These will enable you to identify business problems in your industry that might need this special tool.

To get a better understanding of what this article is about, below is a scenario which I want you to imagine. Put your analytical thinking hats on!

Thought Experiment

Walmrt has appointed you as the head of it’s new vertical – WalKiosk. The company wants you to lead the development of a self servicing (human-less) store where a customer will only interact with Walmrt’s Kiosk, which is very similar to a vending machine. They want to install this Kiosk in various locations across the United States.

A key difference between this Kiosk and a normal vending machine is that the Kiosk’s display does not show the list of items, but simply an audio enabled Google-like search tab. The customer can literally walk up to these Kiosks, and say or type anything after the keyword “OK Walmrt, xxxxxx”. Here is a sample interaction (try to evaluate if a human can do a better job than this Kiosk):

Customer says – “OK Walmrt, I want the shoes which Leonardo DiCaprio wore in the 1st scene of the 1st movie he did with Nolan” in any possible spoken language.

The idea is for the Kiosk to do a quick search and if it finds a convincing answer, it should reply, in the same language as the customer’s query, something like – “Leonardo DiCaprio wore black colored Nike shoes of model xxxxx. Click the link on the kiosk to watch a video cut of the scene you asked me to look at. Great news – we currently have the exact same shoe with the same size as you are wearing, and it’s cost is $200. As you are a loyal customer of Walmrt, I have found a steal deal for you! The new price of the shoe, if you buy it immediately, is $150 for you”.

If the customer says “I want to buy it”, the Kiosk dispenses the shoe once the customer makes the payment.

Kiosk finally replies – “Thanks Mr. XYZ for shopping with us today. Please give your valuable feedback for us to improve our service further.” Customer writes or says the feedback of this transaction and leaves.

This simple transaction, that will probably take a good chunk of your time in today’s world, will be resolved in less than 2 minutes (if everything works, that is).

Sounds futuristic? Here’s a spoiler – all the fancy next gen functional skills you need to build in this Kiosk will be done mainly by a single architecture – sequence modelling. Here is a small list of tasks the Kiosk needs to do:

Speech Recognition to understand what the customer is saying
Machine Language Translation from source language to a known language (say English)
Name entity/Subject extraction to find the main subject of the customer’s query translated in step 2
Relation Classification to tag relationships between various entities tagged in step 3
Path Query Answering (Similar to Google search) on entity-relationship found in step 3 & 4 using core knowledge graph
Speech Generation to generate answers for the customer with all the relevant information found in step 5
Chatbot skill to have conversational ability and engage with customers just like a human
Text Summarization of customer feedback to work on key challenges/pain points
Product Sales Forecasting to replenish stock

The skills required to create WalKiosk are not limited to these nine steps, but they are good enough to bring out the core idea. Each of these nine skills can be modeled by a single architecture – Sequence Modelling (but you already knew this).

You can imagine sequence modelling as a black box which stays almost the same; all you need to change is the input and target data for each of the nine skill sets. Leveraging the idea that all the model architectures in each step is the same, we can take this a step further and create a single model that takes input in any language and completes the self service process/reporting process/inventory management process all together.

If this was not enough to make you Google all about sequence modelling, let’s look at an exhaustive list of all functions sequence modelling is capable of.

Practical Applications of Sequence Modelling

To make sure we cover most of the possible applications of sequence modelling, we will categorize them based on the type of input and output sequences. Inputs and outputs can be one of the following: Scalar, Trend, Text, Image, Audio or Video. If each of these six can be both input and output, we have 36 categories in total. However, not each of these pairs has been explored in depth yet.

Before moving to the list, pause for a moment and create your own list of applications (you can use our thought experiment as a reference).

Here goes the list:

Reading the table is fairly straight forward:

Type is the category of input/target
Elements are the number of elements in input/target series
Use Cases are the possible applications in the category

We will review a few of these use cases in order to get a grasp of the superpowers that our sequence model possess.

First, let’s talk about the easiest of the lot – Sequence Generators

These generators generally take scalar inputs. The scalar input can be any random seed/number. Following are a few examples of generators:

Note that we can train our model on any specific type of data. For instance, if we train our text generator on a Harry Potter book, it is highly likely that you will get a text which is full of imagination/magic with the main character as Harry Potter. If you were lucky, you might get a chapter that makes sense and you can enjoy this privileged chapter that no one has access to!

Another example – if you train the model on Jazz music, you can create new songs in the same genre using this model. Yet another example – if you train the model on images of animals, you might see how cross breeds might look like.

Next, let’s talk about the favorites – Sequence to sequence NLP Models

Machine Language Translation has reached new heights and is now competing strongly with human translators. Today, you can find real-time translating machines which are based on the core concept of sequence to sequence models.

Text summarization is another important use case of sequence models. Text summarization can significantly reduce the task of manually reading lengthy customer complaints, monitoring compliance based call/chat monitoring, and reviewing customer feedback on product etc.

Chatbot is yet another important application and is now being widely used in Operations/Call Centers/Chat Centers/Personal assistants like Siri/Google Home/Alexa.

Finally, we will talk about a few more sequence to sequence models that go beyond text

Speech recognition is currently the category which has absorbed the maximum investment in terms of money. Speech recognition is extremely important in tools like personal AI assistants (Alexa, Google Home, etc.) and call center speech recording tools.

Currently we have billion dollar companies whose sole competency is speech recognition. Speech recognition also uses sequence to sequence models extensively. Image Captioning is one of the hottest research fields which has a wide application in the social media industry. Subtitle generation has not reached the stage of production yet, but is being actively researched.

End Notes

A lot of the data science talent today focuses its effort on solving problems that already exist. An equally important task, for any successful data scientist or analyst, is to identify and create new tasks that can be solved analytically. The latter is a very different exercise and does not need a lot of coding experience or mathematically background. All you need to know is what is possible and what is not, using a given tool.

Problem identification is a skill set that is a “must” for any senior analytics professional. I hope this introductory article on sequence learning gave you strong motivation to start searching for new problems in your industry that can be solved using this method.

If you have any ideas or suggestions regarding the topic, do let me know in the comments below!

Learn, engage , compete and get hired!

Tavish

Advanced Data Science Deep Learning Machine Learning NLP

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Ramprasad

Thank you. Please post simple chatbot model (train+use) implementation using tensorflow in python.

Show 1 reply

Aishwarya Singh

Hi Ramprasad, You can follow this link for TensorFlow's seq2seq model.

Srinath

Greetings!!, Thanks a ton for sharing the insights I liked the idea of not reinventing the models when we already have solutions to most of the problems is good point to start with when we are starting the journey in Data Science. I am currently working on converting free text to a cat log or bucket them into categories . Is there a way that you can help with my use case Would appreciate your help

Reading list

Introduction to NLP

Text Pre-processing

NLP Libraries

Regular Expressions

String Similarity

Spelling Correction

Topic Modeling

Text Representation

Information Retrieval System

Word Vectors

Word Senses

Dependency Parsing

Language Modeling

Getting Started with RNN

Different Variants of RNN

Machine Translation and Attention

Self Attention and Transformers

Transfomers and Pretraining

Question Answering

Text Summarization

Named Entity Recognition

Coreference Resolution

Audio Data

ASR

Audio Separation

Chatbot

Auto NLP

A Must-Read Introduction to Sequence Modelling (with use cases)

Introduction

Table of Contents

Two Main Advances in the Field of ANN

Thought Experiment

Practical Applications of Sequence Modelling

Here goes the list:

First, let’s talk about the easiest of the lot – Sequence Generators

Next, let’s talk about the favorites – Sequence to sequence NLP Models

Finally, we will talk about a few more sequence to sequence models that go beyond text

End Notes

Learn, engage , compete and get hired!

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or

JSESSIONID

li_rm

AnalyticsSyncHistory