Complete Guide to build your AI Chatbot with NLP in Python
In this Comprehensive Guide, we are going to be covering the following topics and the end goal is to teach you how to make your own personal intelligent AI chatbot:
- NLP and its uses in speech interpretation.
- AI and its uses in creating an intelligent responsive chatbot to interact with users
- Different packages and pre-trained tools are required to create a responsive intelligent chatbot similar to virtual assistants such as ALEXA or Siri.
- This comprehensive guide will cover the basic prerequisites and the steps to be covered in order to create a chatbot. You can follow along with the code snippets or modify them as per your requirements.
The contents of the guide can be divided down into the following sections.
2. What is NLP?
2.1. NLP tasks
2.2. Types of Chatbots
2.3. Challenges For Your Chatbot
2.4. Installing Packages
2.5. What is Speech Recognition?
2.6. The Language Model
Introduction to AI Chatbot
Were you ever curious as to how to build a talking ChatBot with Python and also have a conversation with your own personal AI?
As the topic suggests we are here to help you have a conversation with your AI today. To have a conversation with your AI, you need a few pre-trained tools which can help you build an AI chatbot system. In this article, we will guide you to combine speech recognition processes with an artificial intelligence algorithm.
Natural Language Processing or NLP is a prerequisite for our project. NLP allows computers and algorithms to understand human interactions via various languages. In order to process a large amount of natural language data, an AI will definitely need NLP or Natural Language Processing. Currently, we have a number of NLP research ongoing in order to improve the AI chatbots and help them understand the complicated nuances and undertones of human conversations.
Chatbots are nothing but applications that are used by businesses or other entities to conduct an automatic conversation between a human and an AI. These conversations may be via text or speech. Chatbots are required to understand and mimic human conversation while interacting with humans from all over the world. From the first chatbot to be created ELIZA to Amazon’s ALEXA today, chatbots have come a long way. In this tutorial, we are going to cover all the basics you need to follow along and create a basic chatbot that can understand human interaction and also respond accordingly. We will be using speech recognition APIs and also pre-trained Transformer models.
What is NLP?
NLP stands for Natural Language Processing. Using NLP technology, you can help a machine understand human speech and spoken words. NLP combines computational linguistics that is the rule-based modelling of the human spoken language with intelligent algorithms such as statistical, machine, and deep learning algorithms. These technologies together create the smart voice assistants and chatbots that you may be used in everyday life.
There are a number of human errors, differences, and special intonations that humans use every day in their speech. NLP technology allows the machine to understand, process, and respond to large volumes of text rapidly in real-time. In everyday life, you have encountered NLP tech in voice-guided GPS apps, virtual assistants, speech-to-text note creation apps, and other app support chatbots. This tech has found immense use cases in the business sphere where it’s used to streamline processes, monitor employee productivity, and increase sales and after-sales efficiency.
Tasks in NLP
The task of interpreting and responding to human speech is filled with a lot of challenges that we have discussed in this article. In fact, it takes humans years to overcome these challenges and learn a new language from scratch. To overcome these challenges, programmers have integrated a lot of functions to the NLP tech to create useful technology that you can use to understand human speech, process, and return a suitable response.
NLP tasks are responsible for breaking down human text and audio signals from voice data in ways that can be analyzed and converted into data that the computer understands. Some of the tasks included in NLP data ingestion are as follows:
- Speech recognition: speech recognition or speech to text conversion is an incredibly important process involved in speech analysis. Speech tagging or grammatical tagging is a subprocess of speech recognition that allows a computer to break down speech and tag it with implied context, accent or other speech definition points.
- Word sense Disambiguation: In human speech, a word may have multiple meanings. The process of word sense disambiguation is a semantic analysis that selects the meaning of a given word that best suits it in the given context. For example, this process assists in deciding whether a word is a verb or a pronoun.
- Named Entity Recognition or NEM: NEM identifies words and phrases as useful entities for example, ‘Dev’ is a person’s name and ‘America’ is the name of a country.
- Sentiment analysis: Human speech often contains sentiments and undertones Extracting these undertones and hidden contexts such as attitude, sarcasm, fear or joy is perhaps the most difficult task that is undertaken by NLP processes.
Types of AI Chatbots
Chatbots are a relatively recent concept and despite having a huge number of programs and NLP tools, we basically have just two different categories of chatbots based on the NLP technology that they utilize. These two types of chatbots are as follows:
- Scripted chatbots: Scripted chatbots are classified as chatbots that work on pre-determined scripts that are created and stored in their library. Whenever a user types a query or speaks a query (in the case of chatbots equipped with speech to text conversion modules), the chatbot responds to this query according to the pre-determined script that is stored within its library.One of the cons of such a chatbot is the fact that user needs to provide their query in a very structured manner with comma-separated commands or other forms of a regular expression that makes it easier for the bot to perform string analysis and understand the query. This makes this kind of chatbot difficult to integrate with NLP aided speech to text conversion modules. Hence, these chatbots can hardly ever be converted into smart virtual assistants.
- Artificially Intelligent Chatbots: Artificially intelligent chatbots, as the name suggests, are created to mimic human-like traits and responses. NLP or Natural Language Processing is hugely responsible for enabling such chatbots to understand the dialects and undertones of human conversation. NLP combined with artificial intelligence creates a truly intelligent chatbot that can respond to nuanced questions and learn from every interaction to create better-suited responses the next time.The AI chatbots have been developed to assist human users on different platforms such as automated chat support or virtual assistants helping with a song or restaurant selection.
Challenges For Your AI Chatbot
In the current world, computers are not just machines celebrated for their calculation powers. Today, the need of the hour is interactive and intelligent machines that can be used by all human beings alike. For this, computers need to be able to understand human speech and its differences.
NLP technologies have made it possible for machines to intelligently decipher human text and actually respond to it as well. However, communication amongst humans is not a simple affair. There are a lot of undertones dialects and complicated wording that makes it difficult to create a perfect chatbot or virtual assistant that can understand and respond to every human.
To overcome the problem of chaotic speech, developers have had to face a few key hurdles which need to be explored in order to keep improving these chatbots. To understand these hurdles or problems we need to under how NLP works to convert human speech into something an algorithm or AI understands. Here’s a list of snags that a chatbot hits whenever users are trying to interact with it:
1: Synonyms, homonyms, slang2: Misspellings3: Abbreviations4: Complex punctuation rules5: Accents, dialects and speech differences with the age and other issues of humans. (for eg. lisps, drawls, etc)
To a human brain, all of this seems really simple as we have grown and developed in the presence of all of these speech modulations and rules. However, the process of training an AI chatbot is similar to a human trying to learn an entirely new language from scratch. The different meanings tagged with intonation, context, voice modulation, etc are difficult for a machine or algorithm to process and then respond to. NLP technologies are constantly evolving to create the best tech to help machines understand these differences and nuances better.
Installing Packages required to Build AI Chatbot
We will begin by installing a few libraries which are as follows :
# To be able to convert text to Speech ! pip install SpeechRecognition #(3.8.1) #To convey the Speech to text and also speak it out !pip install gTTS #(2.2.3) # To install our language model !pip install transformers #(4.11.3) !pip install tensorflow #(2.6.0, or pytorch)
We will start by importing some basic functions:
import numpy as np
We will begin by creating an empty class which we will build step by step. To build the chatbot, we would need to execute the full script. The name of the bot will be “ Dev”
# Beginning of the AI class ChatBot(): def __init__(self, name): print("----- starting up", name, "-----") self.name = name # Execute the AI if __name__ == "__main__": ai = ChatBot(name="Dev")
What is Speech Recognition?
NLP or Natural Language Processing has a number of subfields as conversation and speech are tough for computers to interpret and respond to. One such subfield of NLP is Speech Recognition. Speech Recognition works with methods and technologies to enable recognition and translation of human spoken languages into something that the computer or AI can understand and respond to. For computers, understanding numbers is easier than understanding words and speech. When the first few speech recognition systems were being created, IBM Shoebox was the first to get decent success with understanding and responding to a select few English words. Today, we have a number of successful examples which understand myriad languages and respond in the correct dialect and language as the human interacting with it. Most of this success is through the SpeechRecognition library. To use popular Google APIs we will use the following code:
import speech_recognition as sr def speech_to_text(self): recognizer = sr.Recognizer() with sr.Microphone() as mic: print("listening...") audio = recognizer.listen(mic) try: self.text = recognizer.recognize_google(audio) print("me --> ", self.text) except: print("me --> ERROR")
Note: The first task that our chatbot must work for is the speech to text conversion. Basically, this involves converting the voice or audio signals into text data. In summary, the chatbot actually ‘listens’ to your speech and compiles a text file containing everything it could decipher from your speech. You can test the codes by running them and trying to say something aloud. It should optimally capture your audio signals and convert them into text.
# Execute the AI if __name__ == "__main__": ai = ChatBot(name="Dev") while True: ai.speech_to_text()
Note: Here I am speaking and not typing
Next, our AI needs to be able to respond to the audio signals that you gave to it. In simpler words, our chatbot has received the input. Now, it must process it and come up with suitable responses and be able to give output or response to the human speech interaction. To follow along, please add the following function as shown below. This method ensures that the chatbot will be activated by speaking its name. When you say “Hey Dev” or “Hello Dev” the bot will become active.
def wake_up(self, text): return True if self.name in text.lower() else False
As a cue, we give the chatbot the ability to recognize its name and use that as a marker to capture the following speech and respond to it accordingly. This is done to make sure that the chatbot doesn’t respond to everything that the humans are saying within its ‘hearing’ range. In simpler words, you wouldn’t want your chatbot to always listen in and partake in every single conversation. Hence, we create a function that allows the chatbot to recognize its name and respond to any speech that follows after its name is called.
After the chatbot hears its name, it will formulate a response accordingly and say something back. For this, the chatbot requires a text-to-speech module as well. Here, we will be using GTTS or Google Text to Speech library to save mp3 files on the file system which can be easily played back.
The following functionality needs to be added to our class so that the bot can respond back
from gtts import gTTS import os @staticmethod def text_to_speech(text): print("AI --> ", text) speaker = gTTS(text=text, lang="en", slow=False) speaker.save("res.mp3") os.system("start res.mp3") #if you have a macbook->afplay or for windows use->start os.remove("res.mp3")
#Those two functions can be used like this # Execute the AI if __name__ == "__main__": ai = ChatBot(name="Dev") while True: ai.speech_to_text() ## wake up if ai.wake_up(ai.text) is True: res = "Hello I am Dev the AI, what can I do for you?" ai.text_to_speech(res)
Next, we can consider upgrading our chatbot to do simple commands like some o the virtual assistants help you to do. An example of such a task would be to equip the chatbot to be able to answer correctly whenever the user asks for the current time. To add this function to the chatbot class, follow along with the code given below:
import datetime @staticmethod def action_time(): return datetime.datetime.now().time().strftime('%H:%M') #and run the script after adding the above function to the AI class
# Run the AI if __name__ == "__main__": ai = ChatBot(name="Dev") while True: ai.speech_to_text() ## waking up if ai.wake_up(ai.text) is True: res = "Hello I am Dev the AI, what can I do for you?" ## do any action elif "time" in ai.text: res = ai.action_time() ## respond politely elif any(i in ai.text for i in ["thank","thanks"]): res = np.random.choice( ["you're welcome!","anytime!", "no problem!","cool!", "I'm here if you need me!","peace out!"]) ai.text_to_speech(res)
After all of the functions that we have added to our chatbot, it can now use speech recognition techniques to respond to speech cues and reply with predetermined responses. However, our chatbot is still not very intelligent in terms of responding to anything that is not predetermined or preset. It is now time to incorporate artificial intelligence into our chatbot to create intelligent responses to human speech interactions with the chatbot or the ML model trained using NLP or Natural Language Processing.
The Language Model for AI Chatbot
Here, we will use a Transformer Language Model for our chatbot. This model was presented by Google and it replaced the earlier traditional sequence to sequence models with attention mechanisms. This language model dynamically understands speech and its undertones. Hence, the model easily performs NLP tasks. Some of the most popularly used language models are Google’s BERT and OpenAI’s GPT. These models have multidisciplinary functionalities and billions of parameters which helps to improve the chatbot and make it truly intelligent.
This is where the chatbot becomes intelligent and not just a scripted bot that will be ready to handle any test thrown at them. The main package that we will be using in our code here is the Transformers package provided by HuggingFace. This tool is popular amongst developers as it provides tools that are pre-trained and ready to work with a variety of NLP tasks. In the code below, we have specifically used the DialogGPT trained and created by Microsoft based on millions of conversations and ongoing chats on the Reddit platform in a given interval of time.
import transformers nlp = transformers.pipeline("conversational", model="microsoft/DialoGPT-medium") #Time to try it out input_text = "hello!" nlp(transformers.Conversation(input_text), pad_token_id=50256)
Reminder: Don’t forget to provide the pad_token_id as the current version of the library we are using in our code raises a warning when this is not specified. What you can do to avoid this warning is to add this as a parameter.
You will get a whole conversation as the pipeline output and hence you need to extract only the response of the chatbot here.
chat = nlp(transformers.Conversation(ai.text), pad_token_id=50256) res = str(chat) res = res[res.find("bot >> ")+6:].strip()
Finally, we’re ready to run the Chatbot and have a fun conversation with our AI. Here’s the full code:
Great! The bot can both perform some specific tasks like a virtual assistant (i.e. saying the time when asked) and have casual conversations. And if you think that Artificial Intelligence is here to stay, she agrees:
# for speech-to-text import speech_recognition as sr # for text-to-speech from gtts import gTTS # for language model import transformers import os import time # for data import os import datetime import numpy as np # Building the AI class ChatBot(): def __init__(self, name): print("----- Starting up", name, "-----") self.name = name def speech_to_text(self): recognizer = sr.Recognizer() with sr.Microphone() as mic: print("Listening...") audio = recognizer.listen(mic) self.text="ERROR" try: self.text = recognizer.recognize_google(audio) print("Me --> ", self.text) except: print("Me --> ERROR") @staticmethod def text_to_speech(text): print("Dev --> ", text) speaker = gTTS(text=text, lang="en", slow=False) speaker.save("res.mp3") statbuf = os.stat("res.mp3") mbytes = statbuf.st_size / 1024 duration = mbytes / 200 os.system('start res.mp3') #if you are using mac->afplay or else for windows->start # os.system("close res.mp3") time.sleep(int(50*duration)) os.remove("res.mp3") def wake_up(self, text): return True if self.name in text.lower() else False @staticmethod def action_time(): return datetime.datetime.now().time().strftime('%H:%M') # Running the AI if __name__ == "__main__": ai = ChatBot(name="dev") nlp = transformers.pipeline("conversational", model="microsoft/DialoGPT-medium") os.environ["TOKENIZERS_PARALLELISM"] = "true" ex=True while ex: ai.speech_to_text() ## wake up if ai.wake_up(ai.text) is True: res = "Hello I am Dave the AI, what can I do for you?" ## action time elif "time" in ai.text: res = ai.action_time() ## respond politely elif any(i in ai.text for i in ["thank","thanks"]): res = np.random.choice(["you're welcome!","anytime!","no problem!","cool!","I'm here if you need me!","mention not"]) elif any(i in ai.text for i in ["exit","close"]): res = np.random.choice(["Tata","Have a good day","Bye","Goodbye","Hope to meet soon","peace out!"]) ex=False ## conversation else: if ai.text=="ERROR": res="Sorry, come again?" else: chat = nlp(transformers.Conversation(ai.text), pad_token_id=50256) res = str(chat) res = res[res.find("bot >> ")+6:].strip() ai.text_to_speech(res) print("----- Closing down Dev -----")
Note: I had later switched from google collab to my local machine due to some module issues which I faced during implementation and hence I am sharing my experience here so that if any of you also face the same issue can solve it. Obviously, Google is also there but the following lines will explain the issue. I used Python 3.9 as it had all the modules necessary and Python 3.6 and older versions will also work. Python 3.8 or the latest version might not have all the modules ported to match the version and hence I would suggest using Python 3.9 or older versions than 3.6.
To run a file and install the module, use the command “python3.9” and “pip3.9” respectively if you have more than one version of python for development purposes. “PyAudio” is another troublesome module and you need to manually google and find the correct “.whl” file for your version of Python and install it using pip.
The link to the full code can be found here :
Bonus tips: Feel free to drop a star if you liked this tutorial or bot and feel free to fork and create your own AI chatbot and call it whatever you want!
In this guide, we have demonstrated a step-by-step tutorial that you can utilize to create a conversational Chatbot. This chatbot can be further enhanced to listen and reply as a human would. The codes included here can be used to create similar chatbots and projects. To conclude, we have used Speech Recognition tools and NLP tech to cover the processes of text to speech and vice versa. Pre-trained Transformers language models were also used to give this chatbot intelligence instead of creating a scripted bot. Now, you can follow along or make modifications to create your own chatbot or virtual assistant to integrate into your business, project, or your app support functions. Thanks for reading and hope you have fun recreating this project.
Thank you for sticking to the end and happy exploring with your own personal AI.
Python Developer & Data Engineer | Freelance Tech Writer
- Image 1 – https://unsplash.com/photos/V5vqWC9gyEU
- Image 2 – https://unsplash.com/photos/t1PaIbMTJIM
- Image 3 – https://unsplash.com/photos/GhtVhowMQvo
- Image 4 – https://unsplash.com/photos/0E_vhMVqL9g
- Image 5 – https://unsplash.com/photos/1DjbGRDh7-E