Raghav Agrawal — December 19, 2021
Beginner NLP

This article was published as a part of the Data Science Blogathon

Hello and welcome to the interesting article that revolves around a very cheesy and hot topic in trending technologies which is NLP(Natural Language Processing). In this article, we will learn what exactly is NLP, what makes it complex to learn and what challenges do we face while solving any NLP problem statement. After reading this article I hope that there will be a formation of different opinions about how you look and understand NLP.

 NLP Landscape from 1960 to 2020

 

Table of Contents

  1. Introduction to NLP
  2. Evolution of NLP
  3. What is the Need for NLP
  4. Real-World Applications
  5. Common NLP Tasks
  6. Approaches Used to solve NLP use cases
  7. Challenges in NLP
  8. Conclusion

Introduction to NLP

NLP is a combination of human language, computer science, and artificial intelligence. The main moto of NLP is to make machines capable of understanding and responding in natural language. NLP in simple words in a subset of Artificial intelligence that deals with language or textual data use cases to extract different text relationships and solve real-world problems related to language whether in speech, written text, or sign language form.

Evolution of NLP

The first time when the technique was heard from the term or question that can machine communicate or talk? so in mid-1940 two researchers proposed a paper related to “Machine Translation”. The first approach was a very simple approach which talks about mapping the words of one language with another language using a bilingual dictionary but it does not talk about any grammar. The second approach is given by a Russian researcher who talks in detail about how to deal with grammar. He also used a bilingual dictionary along with studying grammar. Both techniques were disruptive techniques of that time that catch the attraction of many great technical researchers.

Working with text, or making computers enable to deal with text became more popular during the second world war when Germany used machine translation to convey messages using secret keys through encryption and decryption to the commanders on the battlefield. Then in 1950 when Alan Turing published a paper entitled “Thinking Machine” when AI came into existence which gave power to NLP evolution. In the next heading, we will discuss what and how to lead the NLP to become the most searched and required technology of today’s era and how it is helping to solve many real-world problems in an easy way.

Need for NLP

First, understand what is natural language. Natural language also known as ordinary language is any language that has evolved in humans naturally through use and repetition without any planning for the means of doing communication with each other. It can take any form, such as speech or sign language. Now comes the question Why NLP is Important? so let’s answer this in the next paragraph.

If you Read about human evolution, then hundreds of years ago humans live like animals but as per time human evolution happens to the next level to a great intelligence which we are using and living in. so there are 2 factors which led this evolution.

1) Language and communication – the greatest factor is we communicate through language which makes it easy that us to express our idea in the third person. we learn from past happening and

2) Machines – Today in each domain machines and the internet is there which makes the task easy. But first, there were no such machines. today we can talk to a machine which gives a feel like we are talking to humans and this is what NLP makes happen in real life.

Though from 1960 to 2020 many NLP models and algorithms were built to help machines deal with text data and generate and useful information out of it.

nlp models

 

Real-world Applications

Now we will discuss some NLP applications that today’s corporate sector is using and after that, we will see what application is for you to build as a beginner.

1) Contextual Advertisement

Twenty to thirty years ago a single type of advertisement was shown on television, mobiles, websites, etc. But now with the help of NLP advertisement on different applications including television is shown as per user interest. This all is achieved by keeping track of users’ different social media handles, chats, search history, and using NLP over this data to extract relevant topics you are interested in. Google Adsense is a wonderful example of this which you observe on streaming sites(Youtube, Netflix), webpages, etc.

Contextual Advertisement - nlp landscape

2) Search Autocorrect and Autocomplete

If you start a google search engine then you will get many suggestions about the search that you are intended is based on a few words. This is one of the crucial applications of NLP which is widely and majorly used by search engines and today this is also implemented in different NLP applications.

3) Machine Translation

Everyone at least once has used google translator which converts one language to any other language so-called machine translation. There are many tools available today that can convert thousands of words from one language to another language and it is one of the best features that everyone likes to include in any NLP project.

4) Text Summarization

Today web contains lots of data and we humans need to learn many things from this data but the problem arises when we start to read. There are many unwanted sentences and text included in each document and we try to find relevant and underlined points so This problem is sorted out by NLP. Text summarization today helps to find short and sweet summaries from a large chunk of data whether it is an online article, research paper, or any other file. The same application of NLP is also applied in Image captioning where NLP helps to describe any Image and provide a perfect caption for a particular Image.

5) Market Intelligence

The market can be benefited from NLP in many ways and NLP helps to analyze regular market conditions like customer likes, dislikes, behavior. Apart from this, we can find sentiments, topics, keywords mainly used by customers related to the market and shops which help to create different strategies and take business decisions according to market conditions.

6) Topic Modeling or Text Extraction

Finding the set of topics on which the complete document is based is known as topic modeling. In simple words it is about providing tags to particular content on which it is based is very simple through NLP.

2) Email Clients (Spam Filtering)

Spam filtering is one of the great applications of NLP. when you receive any mail then these filters first analyze the content and if it is spam then put it under spam.

3) Social Media (Removing Adult content, opinion mining)

Every day there is a huge amount of data collected by different social media sites. The major challenge they have is to keep relevant data and remove illegal posts so with the help of NLP it is easy to perform huge data mining. And currently, opinion mining is one best NLP examples like suppose during the election there are many tweets for a leader so by picking those data and using NLP over it it can be observed that how much and what kind of people’s opinion supports which leader and who is likely to win.

5) Chatbots

Every organization from small scale to large scale provides chatbot service on their website, Apps, and even on social media handle to provide 24 by 7 assistance to users.

There are many more applications and the list is endless. so now you can think of more applications in your daily usage of applications and products where NLP is used or can be applied.

Common NLP tasks

Now we will discuss the common task that you should try after learning a little bit about text preprocessing and NLP. The biggest mistake that beginners do is for practical implementation they start learning about libraries like NLTK, Spacy, etc. Our aim is to understand the problem and how it is going to solve not the library functions and methods. So pick one small task and each task is available on the internet and make your hand dirty. while solving you will automatically learn about the library.

1) Text / Document Classification – In this you are having a large chunk of text dataset and you need to classify it into certain categories. for example you have an email dataset and classify email content as spam and not spam.

2) Sentiment Analysis – Sentiment analysis is all about analyzing people’s views on a particular product, organization, or anything and classifying it into positive, negative, and different sentiments.

3) Information Retrieval – You have the text and you have to extract entities from it like country name, Money related talks, Political or celebrity name which is also known as Named Entity Recognition.

4) Language Detection and Machine Translation – It is also the most used NLP application which means identifying the language of a particular paragraph and converting it into some other language.

5) Conversational Agents – It is none other than chatbots that can easily be developed using NLP. two types of chatbots are used named text-based and speech-based chatbots.

6) Text Generation – while chatting with someone when you type something, then your system attempts to provide you the next word above your keyboard is all about the next generation.

7) Spell checking and Grammar Correction – It’s a human tendency to make silly mistakes so correcting spelling and checking common grammatical errors is the biggest NLP application. the best example is Grammarly of Google.

8) Text Parsing – Text parsing is all about breaking the sentence grammatically in a hierarchical way to make machines able to understand that sentence. It is one of the important preprocessing steps of any NLP project.

Approaches Used for NLP

Till now we have read about different applications and all that we have achieved using NLP  till now then the question arises is how did we achieve it? What approaches or techniques were used to archive those applications? So 3 approaches were used which are described below.

1) Heuristic Approaches

NLP was started in 1950 and the heuristic method was used till 1990 around 40 years. The heuristic approach simply means using a rule-based approach. For example, you are working on sentiment classification and you count a number of positive and negative words, and using an expert you create a rule-based system which is like a nested if-else control statement. there are different types of heuristic approaches proposed in the past 50 years. Let us take the most used top examples of heuristic approaches.

  • Regular Expression – Every programming language has support to a regular expression which helps you to find any particular pattern of text in a large collection of text. for example, in a paragraph, I want to find all salutations used.
  • Wordnet – It is a lexical dictionary that contains an organized collection of words and their related words like running and jogging. So this dictionary is widely used for different NLP applications.
Heuristic Approaches in NLp

The advantage of using the heuristic approach is that it is like a rule-based system created by experts so the chance of failing is very less.

2) Machine Learning Approach

The second is ML-based approaches. The first question will be what is an advantage of using this approach so let us answer each question related to the approach.

What is an advantage of the ML-based approach over Heuristic methods?

In the heuristic method, human experts are supposed to create the rules by deeply analyzing and observing the data which is a cumbersome method when there are many rules and the dataset is huge so Machine learning algorithms prove to be beneficial because they are capable of defining and creating their own rules and also given good performance as compared to heuristic approach.

What is a machine learning-based approach workflow?

First, you need to convert text data to numbers then we create a machine learning model using any algorithm and feed that vectorized data to a machine learning algorithm and algorithm try to identify hidden rules and gives us better performance on unseen data.

3) Deep Learning Approach

The third approach is deep learning which is very popular nowadays in each and every data science application.

What is an advantage of the DL-based approach over Machine Learning?

In machine learning, we convert text to numbers so during this the sequential information in a text which is important is lost and machine learning models do not care about the sequence But deep learning approach came in 2010 they care about sequence information so proved to be better than machine learning models. The first deep learning architecture used for the NLP task is RNN(recurrent Neural Network) then there are many great techniques like LSTM architecture, Transformers.

Challenges in NLP

NLP is one of those technologies which is very challenging to apply on different applications and with different other technologies. Now the question is why it is challenging? So the answer is because NLP is applied to natural language which is evolved and evolving over years per year. we will see the eight factors why NLP is a challenging task.

1) Ambiguity

Human language is much evolved that same thing contain different meanings according to context. for example, consider the below two different sentences.

a) I saw the boy on the beach with my binoculars.
b) I have never tasted a cake quite like that one before.

Now the first sentence creates two meanings that we can understand according to paragraph context by making this understand to any NLP software is a very difficult task.

2) Contextual Words

same words in a different context have a different meaning. consider below sentence example where word ran has a different meaning in both parts of a sentence but how this machines can understand.

I ran to the store because we ran out of milk.

3) Colloquialisms and slang

During human communication, we say something that means something else but humans can understand what it means to say but a machine cannot justify the talks. consider the below sentences as an example.

This task for me is like a piece of cake
playing football is not your cup of tea

In the first sentence, it means the task is very simple to perform but how software can understand this.

4) Sarcasm

Sarcasm everyone knows is telling something else in two different tones that can mean sometimes in the true sense or sometimes indirectly.

5) Diversity

There are so many languages around the globe and in each country. And also we talk about a particular language that software should go in deep to understand this so this task is still in research in NLP.

Conclusion

NLP is a vast field that revolves around human language and solving real-world issues smartly that help systems communicate easily with humans. In this article, we have learned what made NLP such a popular and wide researched topic of the 20th century. We have also seen different challenges faced while dealing with the NLP problem statement and how current deep learning architectures are evolving to solve and cover these challenges. There are more different challenges that you can figure out when you will work on any NLP use case.

If you have any doubts or feedback, feel free to share them in the comments section below or you can connect with me using the below details.

Connect with me on Linkedin

Check out my other articles here and on Blogspot

Thanks for giving your time!

The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion.

About the Author

Raghav Agrawal

I am a final year undergraduate who loves to learn and write about technology. I am a passionate learner, and a data science enthusiast. I am learning and working in data science field from past 2 years, and aspire to grow as Big data architect.

Our Top Authors

Download Analytics Vidhya App for the Latest blog/Article

Leave a Reply Your email address will not be published. Required fields are marked *