Speech is that the most typical means of communication and also the majority of the population within the world relies on speech to speak with each other. A speech recognition system translates spoken languages into text. There are various real-life samples of speech recognition systems. for instance, Apple SIRI recognizes the speech and truncates it into text. A human speech utterance is taken by Speech-To-Text (STT) system as an input and a string of words is required as output. The only objective of this system is to extract, characterize and recognize the information about speech.
1.System Block Diagram
2.How does speech recognition works?
3.Converting an audio file into Text
4.How about converting to different audio languages?
5.Microphone speech to Text
System Block Diagram
In order to recognize speech, the Acoustic Model is used by a speech recognition engine. To create an acoustic model we need to take audio recordings of speech, and their text transcriptions, and we use software to create statistical representations of the sounds that make up each word.
A language model is a file that includes the probabilities of sequences of words. We use Language models for dictation applications, whereas grammars are used in desktop command and control or telephony interactive voice response (IVR) type applications.
A speech engine is the heart of the speech recognition system. This is the software that gives your computer the ability to playback text in a spoken voice (commonly referred to as text-to-speech or TTS).
How does Speech recognition work?
Speech Recognition process
Speech Recognition process Hidden Markov Model (HMM), deep neural network models are wont to convert the audio into text.
HMM (HIDDEN MARKOV MODEL) is the statistical model that produced the output as a sequence of symbols or quantities. The reason behind using the HMMs as a speech recognition tool is their ability to treat speech recognization as a piecewise stationary signal or a short-time stationary signal. In a short time scale (e.g., 10 milliseconds), speech can be approximated as a stationary process.
In this blog, I’m demonstrating a way to convert speech to text using Python. This will be through with the assistance of the “Speech Recognition” API and “PyAudio” library. Speech Recognition API supports several APIs, during this blog I used Google speech recognition API.
!pip install SpeechRecognition
Convert an audio file into text
These are the following steps to convert audio files into text:
Import Speech recognition library
Initializing recognizer class to acknowledge the speech. We are using google speech recognition.
Audio files which are supported by a speech recognition system include wav, AIFF, AIFF-C, FLAC. I used the ‘wav’ to get into this instance
Here we used the audio clips of ‘Taken’ movie which says “I don’t know who you’re I don’t know what you would like if you’re searching for ransom I can tell you I don’t have money”
By default, google recognizer reads English.
#import library import speech_recognition as sr # Initialize recognizer class (for recognizing the speech) r = sr.Recognizer() # Reading Audio file as source # listening to the audio file and store in audio_text variable with sr.AudioFile('I-dont-know.wav') as source: audio_text = r.listen(source) # if the API is unreachable, the recoginize_() method will throw a request error, hence using exception handling try: # using google speech recognition text = r.recognize_google(audio_text) print('Converting audio transcripts into text ...') print(text) except: print('Sorry.. run again...')
How about converting to different audio languages?
English is one of the very common languages. But what if we want to convert from different languages like, German and French. From this Speech-To-Text(STT) system, you can convert your speech from any language to Text. Let’s see how?
For example, if we want to read a french language audio file, then need to add a language option in the recogonize_google. The remaining code remains the same.
#Adding french language option text = r.recognize_google(audio_text, language = "fr-FR")
Again, the required language option is added in the recognize_google() for the language recognization. I am talking in Tamil, Indian languages and adding “ta-IN” in the language option.
# Adding "Tamil language" print(“Text: “+r.recognize_google(audio_text, language = “ta-IN”))
I just said “how are you” in Tamil and it prints the text in Tamil accurately.
Microphone speech into text
Microphones are used to take audio as input from users. There are many different libraries are available for converting Microphone speech into Text. Here we use PyAudio for this conversion.
We are required to install the PyAudio library which is used to receive audio input and output through the microphone and speaker. It helps to extract our voice through the microphone.
!pip install PyAudio
We have to use the Microphone class, Instead of an audio file source. The remaining steps are the same.
#import library import speech_recognition as sr # Initialize recognizer class (for recognizing the speech) r = sr.Recognizer() # Reading Microphone as source # listening to the speech and store in audio_text variable with sr.Microphone() as source: print("Talk") audio_text = r.listen(source) print("Time over, thanks") # recoginize_() method will throw a request error if the API is unreachable, hence using exception handling try: # using google speech recognition print("Text: "+r.recognize_google(audio_text)) except: print("Sorry, I did not get that")
I just talked “How are you?”
Training air traffic controllers
Telephony and other domains
Usage in education and daily life
Google speech recognition API is a straightforward method to convert speech into text, but it requires an online connection to work. In this blog, we’ve seen a way to convert the speech into text using Google speech recognition API. This is able to be very helpful for NLP projects especially handling audio transcripts data. If you’ve got anything to feature, please be at liberty to go away a comment! Thanks for reading. Continue learning and stay tuned for more!
The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.You can also read this article on our Mobile APP