What is LSTM for Text Classification?

Shraddha Last Updated : 19 Apr, 2024

7 min read

With an emerging field of deep learning, performing complex operations has become faster and easier. As you start exploring the field of deep learning, you are definitely going to come across words like Neural networks, recurrent neural networks, LSTM, GRU, etc. This article explains LSTM Python and its use in Text Classification.

What is LSTM?
Why LSTM?
How does LSTM Work in Python?
LSTM Python for Text Classification
Model Defining
Training the Model for LSTM Python
Prediction
- Accuracy
- Predict
Frequently Asked Questions

This article was published as a part of the Data Science Blogathon.

What is LSTM?

LSTM stands for Long-Short Term Memory. LSTM is a type of recurrent neural network but is better than traditional recurrent neural networks in terms of memory. Having a good hold over memorizing certain patterns LSTMs perform fairly better. As with every other NN, LSTM can have multiple hidden layers and as it passes through every layer, the relevant information is kept and all the irrelevant information gets discarded in every single cell. How does it do the keeping and discarding you ask?

Why LSTM?

Tradition neural networks suffer from short-term memory. Also, a big drawback is the vanishing gradient problem. ( While backpropagation the gradient becomes so small that it tends to 0 and such a neuron is of no use in further processing.) LSTMs efficiently improves performance by memorizing the relevant information that is important and finds the pattern.

How does LSTM Work in Python?

LSTM has 3 main gates:

FORGET Gate
INPUT Gate
OUTPUT Gate

Let’s have a quick look at them one by one.

FORGET Gate

This gate is responsible for deciding which information is kept for calculating the cell state and which is not relevant and can be discarded. The h_t-1 is the information from the previous hidden state (previous cell) and x_t is the information from the current cell. These are the 2 inputs given to the Forget gate. They are passed through a sigmoid function and the ones tending towards 0 are discarded, and others are passed further to calculate the cell state.

INPUT Gate

Input Gate updates the cell state and decides which information is important and which is not. As forget gate helps to discard the information, the input gate helps to find out important information and store certain data in the memory that relevant. h_t-1 and x_t are the inputs that are both passed through sigmoid and tanh functions respectively. tanh function regulates the network and reduces bias.

Cell State

All the information gained is then used to calculate the new cell state. The cell state is first multiplied with the output of the forget gate. This has a possibility of dropping values in the cell state if it gets multiplied by values near 0. Then a pointwise addition with the output from the input gate updates the cell state to new values that the neural network finds relevant.

OUTPUT Gate

The last gate which is the Output gate decides what the next hidden state should be. h_t-1 and x_t are passed to a sigmoid function. Then the newly modified cell state is passed through the tanh function and is multiplied with the sigmoid output to decide what information the hidden state should carry.

LSTM Python for Text Classification

There are many classic classification algorithms like Decision trees, RFR, SVM, that can fairly do a good job, then why to use LSTM for classification?

One good reason to use LSTM is that it is effective in memorizing important information.

If we look and other non-neural network classification techniques they are trained on multiple word as separate inputs that are just word having no actual meaning as a sentence, and while predicting the class it will give the output according to statistics and not according to meaning. That means, every single word is classified into one of the categories.

This is not the same in LSTM. In LSTM we can use a multiple word string to find out the class to which it belongs. This is very helpful while working with Natural language processing. If we use appropriate layers of embedding and encoding in LSTM, the model will be able to find out the actual meaning in input string and will give the most accurate output class. The following code will elaborate the idea on how text classification is done using LSTM.

Model Defining

Defining the LSTM python model to train the data on.

Code

#model



embedding_vector_features=45

model=Sequential()

model.add(Embedding(voc_size,embedding_vector_features,input_length=sent_length))

model.add(LSTM(128,input_shape=(embedded_docs.shape),activation='relu',return_sequences=True))

model.add(Dropout(0.2))

model.add(LSTM(128,activation='relu'))

model.add(Dropout(0.2))

# for units in [128,128,64,32]:

# model.add(Dense(units,activation='relu'))

# model.add(Dropout(0.2))

model.add(Dense(32,activation='relu'))

model.add(Dropout(0.2))

model.add(Dense(4,activation='softmax'))

model.compile(loss='sparse_categorical_crossentropy',optimizer='adam',metrics=['accuracy'])

print(model.summary())

Summary

Model: "sequential_2"

_________________________________________________________________

Layer (type) Output Shape Param #
=================================================================

embedding_2 (Embedding) (None, 35, 45) 360000
_________________________________________________________________

lstm_4 (LSTM) (None, 35, 128) 89088
_________________________________________________________________

dropout_6 (Dropout) (None, 35, 128) 0
_________________________________________________________________

lstm_5 (LSTM) (None, 128) 131584
_________________________________________________________________

dropout_7 (Dropout) (None, 128) 0
_________________________________________________________________

dense_4 (Dense) (None, 32) 4128
_________________________________________________________________

dropout_8 (Dropout) (None, 32) 0
_________________________________________________________________

dense_5 (Dense) (None, 4) 132
=================================================================

Total params: 584,932
Trainable params: 584,932
Non-trainable params: 0
_________________________________________________________________

We define a sequential model and add various layers to it.

Explanation

The first layer is Embedding layer. It representing words using a dense vector representation. The position of a word within the vector space is based on the words that surround the word when it is used. For eg. “king” is placed near “man” and “queen” is placed near “woman”. The vocabulary size is provided.

The next layer is an LSTM layer with 128 neurons. “embedded_docs” is the input list of sentences which is one hot encoded and every sentence is made of the same length. The activation function is rectified linear, which widely used. Any other relevant activation function can be used. “return_sequences=True” this is an important parameter while using multiple LSTM layer as it enables the output of the previous LSTM layer to be used as an input to the next LSTM layer. If it is not set to true, the next LSTM layer will not get the input.

A dropout layer is used for regulating the network and keeping it as away as possible from any bias. Another LSTM layer with 128 cells followed by some dense layers.

The final Dense layer is the output layer which has 4 cells representing the 4 different categories in this case. The number can be changed according to the number of categories.

Compiling the model using adam optimizer and sparse_categorical_crossentropy. Adam optimizer is the current best optimizer for handling sparse gradients and noisy problems. The sparse_categorical_crossentropy is mostly used when the classes are mutually exclusive, ie, when each sample belongs to exactly one class. The Summary explains all the details of the model.

Training the Model for LSTM Python

Now that the model is ready, the data to be trained it split into train data and test data. Here the split is 90-10. X_final and y_final are the independent and dependent datasets.

Code:

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X_final, y_final, test_size=0.1, random_state=42,stratify=y_final)

The next step is to train the LSTM model using the train data, and the test data is used for validating. Model.fit() is used for this purpose.

Code:

model.fit(X_train,y_train,validation_data=(X_test,y_test),epochs=120,batch_size=64)

The epochs are the number of times the process will be repeated. It can be 20,120, 2000, 20000, any number. The batch size is 64, ie, for every epoch, a batch of 64 inputs will be used to train the model. It mostly depends on how large the dataset is.

Prediction

After training is completed, it’s time to find out the result and predict using the model.

Accuracy

Code:

results = model.evaluate(X_test,y_test)

The model is evaluated and the accuracy of how well the model classifies the data is calculated. “results” will have the accuracy score and the loss. In some cases increasing the number of epochs can increase the accuracy as the model gets trained better.

To use the trained model for predicting, the predict() function is used.

Predict

Code:

The “embedded_docs_pred” is the list is words or sentences that is to be classified and is one-hot encoded and padded to make them of equal length.

y_pred = model.predict(embedded_docs_pred)
y_pred:

array([[2.65598774e-01, 2.36345261e-01, 4.28116888e-01, 6.99390173e-02],
       [2.47113910e-02, 9.56604779e-01, 3.82159604e-03, 1.48621844e-02],
       [1.53286534e-03, 9.94825006e-01, 1.15779374e-04, 3.52645456e-03],
       [5.94987810e-01, 3.00740451e-01, 7.62206316e-02, 2.80511230e-02]])

The output will have a list for every input (can be a word or a sentence). Every list has the predicted score for 4 classes. The maximum of them is the class for that respective input. As discussed above LSTM facilitated us to give a sentence as an input for prediction rather than just one word, which is much more convenient in NLP and makes it more efficient.

Conclusion

In conclusion, LSTM (Long Short-Term Memory) models have proven to be a powerful tool for text classification in Python. With their ability to capture long-term dependencies and handle sequential data, LSTM models offer improved accuracy in classifying text. By implementing LSTM models in Python, researchers and practitioners can leverage the strengths of this architecture to achieve better results in various text classification tasks, opening up new possibilities for natural language processing applications. Sign-up for our Blackbelt program if you want to learn more about it!

Frequently Asked Questions

Q1. Can I use LSTM for text classification?

A. Yes, LSTM (Long Short-Term Memory) networks are commonly used for text classification tasks due to their ability to capture long-range dependencies in sequential data like text.

Q2. Can LSTM be used for text generation?

A. Yes, LSTM can indeed be employed for text generation tasks. Its ability to learn patterns and generate sequences makes it suitable for applications like generating text in various domains.

Q3. Can we use LSTM for NLP?

A. Absolutely, LSTM is widely used in natural language processing (NLP) tasks such as sentiment analysis, machine translation, and named entity recognition due to its effectiveness in handling sequential data like language.

Q4. Is LSTM good for classification?

A. Yes, LSTM can be effective for classification tasks in NLP due to its ability to capture intricate patterns and dependencies in text data, leading to accurate predictions in tasks such as sentiment analysis or document classification.

The media shown in this article on LSTM for Text Classification are not owned by Analytics Vidhya and are used at the Author’s discretion.

Shraddha

Advanced Classification NLP Project Python

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Sarah

Hi Shraddha, Thank you for this great blog. I got an error when I reshape the input for the LSTM, can you advise me why i got this error and how to solve it. When I reshape the trainingg and testing set: # Reshape the input to shape (num_instances, timesteps, num_features) train_data = train_data.reshape(train_data.shape[0], 1, train_data.shape[1]) test_data=test_data.reshape(test_data.shape[0],1,test_data.shape[1]) #Fit the model history = model.fit(train_data, train_labels, epochs=epochs, batch_size=batch_size, validation_data=(test_data, test_labels), callbacks=[callback]) I got this error : ValueError: Shapes (None,) and (None, 1, 5) are incompatible Thank you in Advance

Reading list

Introduction to NLP

Text Pre-processing

NLP Libraries

Regular Expressions

String Similarity

Spelling Correction

Topic Modeling

Text Representation

Information Retrieval System

Word Vectors

Word Senses

Dependency Parsing

Language Modeling

Getting Started with RNN

Different Variants of RNN

Machine Translation and Attention

Self Attention and Transformers

Transfomers and Pretraining

Question Answering

Text Summarization

Named Entity Recognition

Coreference Resolution

Audio Data

ASR

Audio Separation

Chatbot

Auto NLP

What is LSTM for Text Classification?

Table of contents

What is LSTM?

Why LSTM?

How does LSTM Work in Python?

FORGET Gate

INPUT Gate

Cell State

OUTPUT Gate

LSTM Python for Text Classification

Model Defining

Explanation

Training the Model for LSTM Python

Code:

Prediction

Accuracy

Code:

Predict

Code:

Conclusion

Frequently Asked Questions

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Congratulations, You Did It!

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC