Text Analysis app using Spacy, Streamlit, and Hugging face Spaces

UPPU RAJESH KUMAR Last Updated : 15 Mar, 2022

6 min read

This article was published as a part of the Data Science Blogathon.

Introduction

Text Analysis is a way of extracting meaningful and useful information from unstructured textual data. It is very useful in various fields and is a rapidly growing domain in the field of Natural Language Processing(NLP). It’s basically aimed at extracting machine-readable information to enable a data-driven decision-making process. It also helps us in managing content. Using text analysis we can get rid of human errors while making decisions and also we can be as accurate as possible. For example, if a product manager of an e-commerce website wants to know the public review of his products, ideally, he/she must go through all the reviews posted by the customer and then come to a conclusion regarding the feedback. This is a very time-consuming process and is vulnerable to human errors. If the person reading the reviews misunderstands or misreads, then there is a chance that wrong decisions are made. But, using text analysis we can get this work done within very less time and with high accuracy. We can get sentiments, extract keywords, names, or company information or categorize surveys or product reviews based on their sentiment and topic. The different text analysis techniques that are commonly used are –

Text Classification
Text Extraction
Word Frequency
Collocation
Concordance
Word Sense Disambiguation
Clustering

Text Classification aims to assign a predefined tag or category to the unstructured textual data. Some of the most important text classification tasks are sentiment analysis, topic modeling, language detection, and intent detection.

Text Extraction aims to extract a piece of data that is already present in the data. Some of the important text extraction tasks are keyword extraction, named entity recognition. These are useful in identifying relevant information.

Word Frequency aims to measure the most frequently occurring words in a given text using TF-IDF. We can use this to know the most frequent words that customers use while chatting with a customer support executive or even in the case of reviewing product reviews.

Collocation calculates the words that commonly co-occur with each other. Bi-grams and Tri-grams are the types of collocation that help us find the hidden semantic structure.

A concordance helps us to find the instances and context of words. Word Sense Disambiguation helps us to find the words that have more than one meaning. Clustering enables us to group texts with common attributes as a cluster. In this way, text analysis helps us to find the qualitative aspects of a given text.

In this app, we use Text Classification and Text Extraction techniques to analyze the given sentence. More specifically we use Sentiment analysis, Named Entity Recognition, and Subjectivity. Subjectivity gives us the measure of to what extent a given sentence is opinionated.

Overview

Spacy
Spacy TextBlob
Streamlit
Hugging Face Spaces
Building the application
Deployment
Conclusion

Spacy

Spacy is an open-source python library used for all kinds of Natural Language Processing(NLP) tasks and is widely used in the industry. It offers industry-grade scalable features and is very robust. In this app that we are going to build, we shall use the Named Entity Recognition(NER) of the Spacy library.

Spacy TextBlob

Spacy TextBlob is a component of the Spacy library that enables us to do sentiment analysis. We get sentiment aka polarity of the given sentence and also we get the subjectivity of the sentence. This uses the TextBlob library under the hood to get the results.

Image-1

Streamlit

Streamlit is an open-source python library that is used to build web apps. This can be used to quickly build ML web apps, Data visualization dashboards. This library is easy to learn and anyone can quickly pick up their skills for building user interfaces for their ML apps. We shall use this library to build our web app.

image-2

Hugging face Spaces

Hugging face Spaces is a great way of deploying our machine learning web apps quickly. It offers to host an unlimited number of apps on its servers free of cost. In this project, we will host our app on hugging face spaces.

Hugging faces spaces | Text Analysis — image-3

Building the Application

Firstly, we will install all the necessary libraries as follows –

pip install spacy
pip install spacytextblob
pip install streamlit

Next, we code our application as follows –

import streamlit as st
import spacy
from spacytextblob.spacytextblob import SpacyTextBlob
st.set_page_config(layout='wide', initial_sidebar_state='expanded')
st.title('Text Analysis using Spacy Textblob')
st.markdown('Type a sentence in the below text box and choose the desired option in the adjacent menu.')
side = st.sidebar.selectbox("Select an option below", ("Sentiment", "Subjectivity", "NER"))
Text = st.text_input("Enter the sentence")
@st.cache
def sentiment(text):
    nlp = spacy.load('en_core_web_sm')
    nlp.add_pipe('spacytextblob')
    doc = nlp(text)
    if doc._.polarity<0:
        return "Negative"
    elif doc._.polarity==0:
        return "Neutral"
    else:
        return "Positive"
@st.cache
def subjectivity(text):
    nlp = spacy.load('en_core_web_sm')
    nlp.add_pipe('spacytextblob')
    doc = nlp(text)
    if doc._.subjectivity > 0.5:
        return "Highly Opinionated sentence"
    elif doc._.subjectivity < 0.5:
        return "Less Opinionated sentence"
    else:
        return "Neutral sentence"
@st.cache
def ner(sentence):
    nlp = spacy.load("en_core_web_sm")
    doc = nlp(sentence)
    ents = [(e.text, e.label_) for e in doc.ents]
    return ents
def run():
    if side == "Sentiment":
        st.write(sentiment(Text))
    if side == "Subjectivity":
        st.write(subjectivity(Text))
    if side == "NER":
        st.write(ner(Text))
if __name__ == '__main__':
    run()

Explanation of the above code –

As a first step, we import necessary libraries.

Next we set our application page configuration using ‘st.set_page_config()‘. After this, we give the title of our app page using ‘st.title()‘ and write a short description of what our app does, using the ‘st.markdown()‘. Then we create a sidebar for our application to show the user options using the ‘st.sidebar.selectbox()‘ and give three options for our three text analysis operations as ‘Sentiment’, ‘Subjectivity’ and ‘NER’.

We need to take text input from the user. So we do that using ‘st.text_input()‘. Now we need to create three functions to do three text analysis operations as we wanted. The first function is the sentiment function. We use spacy textblob to find the sentiment of the given text. Here we do a slight modification of the sentiment because spacy textblob gives a polarity score of the text ranging from -1 to 1. If the polarity score is negative then it is ‘Negative’ sentiment. If the polarity score is zero then sentiment is ‘Neutral’ and if the polarity score is positive then the sentiment is ‘Positive’. In this way, we create the Sentiment function as shown in the above code block. We cache this function using ‘@st.cache‘ so that there won’t be any need to re-run the function every time we run the app and this increases the speed of the app.

Similarly, we define the Subjectivity function using spacy textblob. Since subjectivity scores range between 0 and 1 we mark the sentence as highly opinionated if the score is above 0.5 and we mark the sentence as less opinionated if the score is below 0.5 and as a neutral sentence, if the score is equal to 0.5. Next, we create the Named Entity Recognition(NER) function using spacy to get the named entities.

Finally, we create our run function to run the app using all the functions we created. If the user inputs a text and selects the Sentiment option in the sidebar then the sentiment function runs and displays the sentiment. If the user selects the subjectivity option then the subjectivity function runs and displays the result as programmed. Similarly, if the user selects the NER option then the ner function runs and displays the named entities of that text.

Deployment

We created our app. It’s time to deploy it using hugging face spaces. Go to this website and create an account. After creating an account click ‘create space’. Then you can see subsequent pages asking for names for your app and tech stack. Give the desired name, select an appropriate license, select Streamlit under the SDK option and finally click create. After this, you will see a page with instructions to clone your GitHub repo and push it to spaces. Alternatively, you can create a repo within spaces. Here you need to create a ‘requirements.txt’ file. Paste the below content in the requirements text file.

spacy
spacytextblob
https://huggingface.co/spacy/en_core_web_sm/resolve/main/en_core_web_sm-any-py3-none-any.whl

After pasting the text in the file click commit changes and spaces starts building your app. Your app is finally built and ready for use.

I have already created a text analysis app as described in this article.

Please check it out here – Text Analysis With Spacy And Streamlit – a Hugging Face Space by rajesh1729

Conclusion

We have created a simple text analysis app and deployed it on hugging face spaces. These kinds of apps are very useful for the eCommerce industry, customer service industry, etc., If you have any doubts regarding the above code please comment below so that I can clear your doubts.

Interested to read Hindi Text Analysis? Head on to our blog.

image-1 source: spaCyTextBlob · spaCy Universe

image-2 source: Streamlit • The fastest way to build and share data apps

image-3 source: Spaces – Hugging Face

The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion.

UPPU RAJESH KUMAR

Data Science Enthusiast. Interested in NLP, computer vision.

Free Courses

Build a Document Retriever Search Engine with LangChain

Learn to create a document retrieval search engine using LangChain.

4.6

Coding a ChatGPT-style Language Model From Scratch in Pytorch

Build a ChatGPT-style language model using PyTorch.

4.5

Naive Bayes from Scratch

Master Naïve Bayes for ML: Build classifiers, analyze data, and apply Bayes.

Reading list

Text Analysis app using Spacy, Streamlit, and Hugging face Spaces

Introduction

Overview

Spacy

Spacy TextBlob

Streamlit

Hugging face Spaces

Building the Application

Deployment

Conclusion

Login to continue reading and enjoy expert-curated content.

Free Courses

Build a Document Retriever Search Engine with LangChain

Coding a ChatGPT-style Language Model From Scratch in Pytorch

Naive Bayes from Scratch

Recommended Articles

Responses From Readers

Become an Author

Flagship Programs

Free Courses

Popular Categories

Generative AI Tools and Techniques

Popular GenAI Models

AI Development Frameworks

Data Science Tools and Techniques

Reading list

Basics of Machine Learning

Machine Learning Lifecycle

Importance of Stats and EDA

Understanding Data

Probability

Exploring Continuous Variable

Exploring Categorical Variables

Missing Values and Outliers

Central Limit theorem

Bivariate Analysis Introduction

Continuous - Continuous Variables

Continuous Categorical

Categorical Categorical

Multivariate Analysis

Different tasks in Machine Learning

Build Your First Predictive Model

Evaluation Metrics

Preprocessing Data

Linear Models

KNN

Selecting the Right Model

Feature Selection Techniques

Decision Tree

Feature Engineering

Naive Bayes

Multiclass and Multilabel

Basics of Ensemble Techniques

Advance Ensemble Techniques

Hyperparameter Tuning

Support Vector Machine

Advance Dimensionality Reduction

Unsupervised Machine Learning Methods

Recommendation Engines

Improving ML models

Working with Large Datasets

Interpretability of Machine Learning Models

Automated Machine Learning

Model Deployment

Deploying ML Models

Embedded Devices

Text Analysis app using Spacy, Streamlit, and Hugging face Spaces

Introduction

Overview

Spacy

Spacy TextBlob

Streamlit

Hugging face Spaces

Building the Application

Deployment

Conclusion

Login to continue reading and enjoy expert-curated content.

Free Courses

Build a Document Retriever Search Engine with LangChain

Coding a ChatGPT-style Language Model From Scratch in Pytorch

Naive Bayes from Scratch

Recommended Articles

Responses From Readers

Become an Author

Flagship Programs

Free Courses

Popular Categories

Generative AI Tools and Techniques

Popular GenAI Models

AI Development Frameworks

Data Science Tools and Techniques