UPPU RAJESH KUMAR — Published On January 31, 2022 and Last Modified On March 15th, 2022
Advanced Model Deployment NLP Python Text

This article was published as a part of the Data Science Blogathon.

Introduction

Text Analysis is a way of extracting meaningful and useful information from unstructured textual data. It is very useful in various fields and is a rapidly growing domain in the field of Natural Language Processing(NLP).  It’s basically aimed at extracting machine-readable information to enable a data-driven decision-making process. It also helps us in managing content. Using text analysis we can get rid of human errors while making decisions and also we can be as accurate as possible. For example, if a product manager of an e-commerce website wants to know the public review of his products, ideally, he/she must go through all the reviews posted by the customer and then come to a conclusion regarding the feedback. This is a very time-consuming process and is vulnerable to human errors. If the person reading the reviews misunderstands or misreads, then there is a chance that wrong decisions are made. But, using text analysis we can get this work done within very less time and with high accuracy. We can get sentiments, extract keywords, names, or company information or categorize surveys or product reviews based on their sentiment and topic. The different text analysis techniques that are commonly used are –

  • Text Classification
  • Text Extraction
  • Word Frequency
  • Collocation
  • Concordance
  • Word Sense Disambiguation
  • Clustering

Text Classification aims to assign a predefined tag or category to the unstructured textual data. Some of the most important text classification tasks are sentiment analysis, topic modeling, language detection, and intent detection.

Text Extraction aims to extract a piece of data that is already present in the data. Some of the important text extraction tasks are keyword extraction, named entity recognition. These are useful in identifying relevant information.

Word Frequency aims to measure the most frequently occurring words in a given text using TF-IDF. We can use this to know the most frequent words that customers use while chatting with a customer support executive or even in the case of reviewing product reviews.

Collocation calculates the words that commonly co-occur with each other. Bi-grams and Tri-grams are the types of collocation that help us find the hidden semantic structure.

A concordance helps us to find the instances and context of words. Word Sense Disambiguation helps us to find the words that have more than one meaning. Clustering enables us to group texts with common attributes as a cluster. In this way, text analysis helps us to find the qualitative aspects of a given text.

In this app, we use Text Classification and Text Extraction techniques to analyze the given sentence. More specifically we use Sentiment analysis, Named Entity Recognition, and Subjectivity. Subjectivity gives us the measure of to what extent a given sentence is opinionated.

Overview

  1. Spacy
  2. Spacy TextBlob
  3. Streamlit
  4. Hugging Face Spaces
  5. Building the application
  6. Deployment
  7. Conclusion

Spacy

Spacy is an open-source python library used for all kinds of Natural Language Processing(NLP) tasks and is widely used in the industry. It offers industry-grade scalable features and is very robust. In this app that we are going to build, we shall use the Named Entity Recognition(NER) of the Spacy library.

Spacy TextBlob

Spacy TextBlob is a component of the Spacy library that enables us to do sentiment analysis. We get sentiment aka polarity of the given sentence and also we get the subjectivity of the sentence. This uses the TextBlob library under the hood to get the results.

spaCy TextBlob | Text Analysis

Image-1

Streamlit

Streamlit is an open-source python library that is used to build web apps. This can be used to quickly build ML web apps, Data visualization dashboards. This library is easy to learn and anyone can quickly pick up their skills for building user interfaces for their ML apps. We shall use this library to build our web app.

Streamlit | Text Analysis
image-2

Hugging face Spaces

Hugging face Spaces is a great way of deploying our machine learning web apps quickly. It offers to host an unlimited number of apps on its servers free of cost. In this project, we will host our app on hugging face spaces.

Hugging faces spaces | Text Analysis
image-3

Building the Application

Firstly, we will install all the necessary libraries as follows –

pip install spacy
pip install spacytextblob
pip install streamlit

Next, we code our application as follows –

import streamlit as st
import spacy
from spacytextblob.spacytextblob import SpacyTextBlob
st.set_page_config(layout='wide', initial_sidebar_state='expanded')
st.title('Text Analysis using Spacy Textblob')
st.markdown('Type a sentence in the below text box and choose the desired option in the adjacent menu.')
side = st.sidebar.selectbox("Select an option below", ("Sentiment", "Subjectivity", "NER"))
Text = st.text_input("Enter the sentence")
@st.cache
def sentiment(text):
    nlp = spacy.load('en_core_web_sm')
    nlp.add_pipe('spacytextblob')
    doc = nlp(text)
    if doc._.polarity<0:
        return "Negative"
    elif doc._.polarity==0:
        return "Neutral"
    else:
        return "Positive"
@st.cache
def subjectivity(text):
    nlp = spacy.load('en_core_web_sm')
    nlp.add_pipe('spacytextblob')
    doc = nlp(text)
    if doc._.subjectivity > 0.5:
        return "Highly Opinionated sentence"
    elif doc._.subjectivity < 0.5:
        return "Less Opinionated sentence"
    else:
        return "Neutral sentence"
@st.cache
def ner(sentence):
    nlp = spacy.load("en_core_web_sm")
    doc = nlp(sentence)
    ents = [(e.text, e.label_) for e in doc.ents]
    return ents
def run():
    if side == "Sentiment":
        st.write(sentiment(Text))
    if side == "Subjectivity":
        st.write(subjectivity(Text))
    if side == "NER":
        st.write(ner(Text))
if __name__ == '__main__':
    run()

Explanation of the above code –

As a first step, we import necessary libraries.

Next we set our application page configuration using ‘st.set_page_config()‘. After this, we give the title of our app page using ‘st.title()‘ and write a short description of what our app does, using the  ‘st.markdown()‘. Then we create a sidebar for our application to show the user options using the ‘st.sidebar.selectbox()‘ and give three options for our three text analysis operations as ‘Sentiment’, ‘Subjectivity’ and ‘NER’.

We need to take text input from the user. So we do that using ‘st.text_input()‘. Now we need to create three functions to do three text analysis operations as we wanted. The first function is the sentiment function. We use spacy textblob to find the sentiment of the given text. Here we do a slight modification of the sentiment because spacy textblob gives a polarity score of the text ranging from -1 to 1. If the polarity score is negative then it is ‘Negative’ sentiment. If the polarity score is zero then sentiment is ‘Neutral’ and if the polarity score is positive then the sentiment is ‘Positive’. In this way, we create the Sentiment function as shown in the above code block. We cache this function using ‘@st.cache‘ so that there won’t be any need to re-run the function every time we run the app and this increases the speed of the app.

Similarly, we define the Subjectivity function using spacy textblob. Since subjectivity scores range between 0 and 1 we mark the sentence as highly opinionated if the score is above 0.5 and we mark the sentence as less opinionated if the score is below 0.5 and as a neutral sentence, if the score is equal to 0.5. Next, we create the Named Entity Recognition(NER) function using spacy to get the named entities.

Finally, we create our run function to run the app using all the functions we created. If the user inputs a text and selects the Sentiment option in the sidebar then the sentiment function runs and displays the sentiment. If the user selects the subjectivity option then the subjectivity function runs and displays the result as programmed. Similarly, if the user selects the NER option then the ner function runs and displays the named entities of that text.

Deployment

We created our app. It’s time to deploy it using hugging face spaces. Go to this website and create an account. After creating an account click ‘create space’. Then you can see subsequent pages asking for names for your app and tech stack. Give the desired name, select an appropriate license, select Streamlit under the SDK option and finally click create. After this, you will see a page with instructions to clone your GitHub repo and push it to spaces. Alternatively, you can create a repo within spaces. Here you need to create a ‘requirements.txt’ file.  Paste the below content in the requirements text file.

spacy
spacytextblob
https://huggingface.co/spacy/en_core_web_sm/resolve/main/en_core_web_sm-any-py3-none-any.whl

After pasting the text in the file click commit changes and spaces starts building your app. Your app is finally built and ready for use.

I have already created a text analysis app as described in this article.

Please check it out here – Text Analysis With Spacy And Streamlit – a Hugging Face Space by rajesh1729

Conclusion

We have created a simple text analysis app and deployed it on hugging face spaces. These kinds of apps are very useful for the eCommerce industry, customer service industry, etc., If you have any doubts regarding the above code please comment below so that I can clear your doubts.

Interested to read Hindi Text Analysis? Head on to our blog.

image-1 source: spaCyTextBlob · spaCy Universe

image-2 source: Streamlit • The fastest way to build and share data apps

image-3 source: Spaces – Hugging Face

 

The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion. 

About the Author

Our Top Authors

Download Analytics Vidhya App for the Latest blog/Article

Leave a Reply Your email address will not be published. Required fields are marked *