Pranav Dar — May 9, 2018
AVbytes

Overview

  • Google Duplex is an AI system that can hold conversations with human-like tone and perform real-world tasks
  • At the heart of Duplex is a recurrent neural network built using TensorFlow Extended
  • To make the AI sound more like a human, speech disfuelncies (“um”, “hmm”, etc.) have been added

 

Introduction

When you get a call from a digital machine, you can tell it right away. A lot of marketing efforts and service calls are being routed through machines in this way. I’m sure you must have had a lot of experience getting these calls (think calling up your bank and taking ages to get through!).

But what if you couldn’t tell the difference between a human’s voice on the phone and a robot’s? We have seen a lot of improvements in recent years in natural language processing thanks to advancements in deep learning. But it can still be a frustrating experience when the voice on the other end of the line is unable to decipher what you’re trying to tell it. We have to adjust for the machine, instead of the machine adjusting for us.

                                                                              Source: Appleinsider

Google Duplex is an AI machine intelligence system that bridges this gap. Announced at the Google IO conference yesterday in a stunning demo, it can conduct natural conversations and perform practical and realistic tasks over the phone!

The brains behind Google Duplex unveiled this technology with 2 pre-recorded examples – both of around a minute. In the first example, a woman has a conversation with the machine to set up an appointment at a hair salon. It is a truly mind-blowing back and forth conversation – you won’t be able to tell the difference between the human and the machine. In the second example, Google Duplex calls up a restaurant to reserve a table. It’s incredible technology, it really is.

How does this technology work?

At the heart of Google Duplex is a RNN, or a recurrent neural network. It has been built using TensorFlow Extended. To make the voice behind Duplex sound human-like, the developers used a combination of a text-to-speech engine and a synthesis TTS engine to vary the tone of the machine.

Speech disfluencies (“um”, “hmm”, etc.) have been added to the AI to make it sound even more human like. The machine can even understand when to give slow responses and when to respond quickly using low-confidence models or faster approximations.

The developers have used real-time supervised training to train the system whenever in new domains. This is akin to a teacher instructing a student on a subject with various examples.

Google Duplex will be integrated into Google Assistant and rolled out to the public in July. Check out the below video to see the two examples I mentioned above:

 

Our take on this

We have come such a long way in the field of NLP. The days of just analysing sentiments from Tweets feels like ages ago. Audio processing combined with NLP is a truly powerful thing, and Google has tapped into that potential with all it’s might. The demo at the IO conference floored the audience and it has inspired us as well.

It’s both scary and inspiring how awesome deep learning married with real life applications can be. What are your thoughts on this mind blowing AI by Google? Use the comments section below to let us know your thoughts!

 

Subscribe to AVBytes here to get regular data science, machine learning and AI updates in your inbox!

 

About the Author

Pranav Dar

Senior Editor at Analytics Vidhya. Data visualization practitioner who loves reading and delving deeper into the data science and machine learning arts. Always looking for new ways to improve processes using ML and AI.

Our Top Authors

Download Analytics Vidhya App for the Latest blog/Article

12 thoughts on "Google Duplex is a Jaw Dropping Application of Natural Language and Audio Processing"

CHAKRADHAR
CHAKRADHAR says: May 09, 2018 at 10:56 pm
Hey Pranav , " a synthesis TTS engine to vary the tone of the machine. " - I didn't get this exactly !! .... Any good links on which I can dig up on regarding this ? & please attach reference arxiv papers / any related papers where this idea has been implemented . Thanks in advance :) Reply
Harshel
Harshel says: May 10, 2018 at 12:39 am
Google Duplex is truly awesome in terms of technological achievement. Can someone please list out what possible ML models and technology might have been used to make this possible. Reply
Ramesh
Ramesh says: May 10, 2018 at 10:17 am
Is this a breach of privacy of the individuals. Imagine, if the phone is lost and someone gets access to the personal info of the smartphone owner i.e. Dine at a restaurant, or shop some goods etc. then how can these issues be addressed. Reply
Pranav Dar
Pranav Dar says: May 10, 2018 at 10:33 am
Hi Ramesh, There are definitely some scary aspects to this technology. There are tons of ways it can be misused and hopefully Google, or some third party, will ensure some ways to make this secure. Reply
Pranav Dar
Pranav Dar says: May 10, 2018 at 10:51 am
Hi Harshel, You can see the underlying model behind this technology in the article itself. They have used several techniques, but at the core of it is a recurrent neural network (RNN). Reply
Pranov Shobhan Mishra
Pranov Shobhan Mishra says: May 10, 2018 at 11:35 am
Amazing innovation. Great to see an article on it as well. I had seen the video yesterday and with this article, reinforcement happens. Good to get a glimpse of the techniques used. Reply
Pranav Dar
Pranav Dar says: May 10, 2018 at 12:29 pm
Hi Chakradhar, TTS is text-to-speech - a technique slowly gaining traction in deep learning. You can read about the technique on Google's AI blog: https://ai.googleblog.com/2018/05/duplex-ai-system-for-natural-conversation.html Reply
Pranav Dar
Pranav Dar says: May 10, 2018 at 12:32 pm
Glad you enjoyed the article, Pranov! Yes, it is a truly awesome product. Can't wait to try it out. Reply
Hitesh Patel
Hitesh Patel says: May 10, 2018 at 6:15 pm
Hi Pranav, Google Assistant can make calls in other languages too? Reply
Hitesh Patel
Hitesh Patel says: May 10, 2018 at 6:20 pm
Hi Pranav, If Google Assistant can make calls in other languages then it will be biggest talking point. Regards, Hitesh Patel Reply
Pranav Dar
Pranav Dar says: May 10, 2018 at 6:20 pm
Hi Hitesh, According to Google, Google Assistant "will support 30 languages and be available in 80 countries this year". As for which languages Duplex will support, that is still unknown (I assume it is limited to English for now). It will roll out for testing in July and once successful I'm sure we'll see it expand into local languages. Reply
Suresh Kumar
Suresh Kumar says: May 10, 2018 at 9:19 pm
Great to hear the new traction in NLP. Reply

Leave a Reply Your email address will not be published. Required fields are marked *