16 Jan Alexa, Siri, Cortana, and Google walk into a bar..
In the spring of this year, Apple went on a hiring spree to boost Siri’s performances, particularly focusing on employees that could build smarter virtual assistants. In the virtual assistant world, a serious fight for dominance has emerged between Apple, Amazon and Google. Not investing means lagging behind the competitors in this quickly emerging field. That’s why in this blog we will talk about virtual assistants!
Newspapers.Google. Virtual Assistants
How would you search for information about the weather or about previously impeached American presidents? Many years ago, you would go to the library to browse an encyclopaedia or you would buy a newspaper for more recent news. In modern times, one would open their online browser and search the internet. But now, the new generation starts to wonder why you would go through the hassle of unlocking your device, finding your search engine and typing a question when you can simply speak the words and askAlexa, Siri or Google Assistant. All you need is your smartphone or a speaker with a microphone and a working internet connection and you are ready to make a new virtual friend!
Amazon, Apple and Google have recently made major improvements in their virtual assistants: Alexa, Siri, and Google Assistant respectively. Although before, Siri would misunderstand you more often than not and communication with a virtual assistant was more pain than gain, they are here to stay now! But how does it all work?
How they do it
Making first contact
Firstly, choose a speaker that’s compatible with your preferred assistant, like Amazon Echo for Alexa, Google Home for Google Assistant or Apple’s HomePod for Siri. Or just use the virtual assistant app on your smartphone. Other than that, only an internet connection is required. Now simply say the “wake word” and what you would like your assistant to do (“Hey Siri, finish this blog for me!”). Then, the virtual assistant executes four steps to provide the answer to your request.
- In the Automatic Speech Recognition (ASR) module, your audio command is translated into a text command.
- The Natural Language Processing (NLP) module extracts and understands the semantic meaning of the text command.
- After the virtual assistant has understood the command, it’ll find the correct action to take in one of its skills. This could be looking up an answer on the internet, but also a signal to dim your smart lights or play a song.
- If your command requires an actual answer from the virtual assistant, the assistant’s text answer will pass through a text-to-speech synthesis (TTS) module which converts the words into natural sounding, intelligible audio. And that is the sound that is echoed in your living room.
Under the hood
These steps might introduce some new questions, like how these ASR and NLP modules work. How do these modules help decipher what a user has said? The answer is Machine Learning. For the first step – ASR – we feed samples of speech and the corresponding written text to the Machine Learning model, which in turn learns how these samples relate to the text. The more data that gets analysed, the better the model will get at analysing speech. So, when you tell Siri to do something, the ASR will analyse your voice command and output it in text format.
All roads lead to Rome
However, recognizing what was said, does not mean the same as understanding what was said. For that, the NLP module is designed. Like ASR, the more data gets fed into the model, the better it will get at understanding what was said. Only now the input is written text and the output is the action belonging to the command. Imagine, for example, you want to ask about the status of your order. While the variations of how to express something can be infinite, the things you want your assistant to do are finite. More precisely, questions like “Where is my order”, “Track my package” and “Shipment status” would all result in looking up the status of the order.
Make it a daily thing
Like finding the status of your order, virtual assistants can help you perform daily tasks easily. Imagine, for example, you’ve run out of toilet paper. Just say “Hey Google, add toilet paper to my grocery list” and the virtual assistant will do just that. Would you like to hear some music? “Alexa, play Rick Astley’s Greatest Hits” will trigger Alexa to play you that song. Or maybe you’re already tucked into bed and forgot to turn off the lights; one sentence “Hey Siri, can you switch off the living room lights” will result in Siri sending a command to your living room smart light bulbs to switch off.
At the same time, you could also use a virtual assistant at the office. Simply say “Google, schedule a meeting with Mr Putin in the West Wing Roosevelt Room at 8 a.m. tomorrow” and Google Assistant will do the trick for you. Similarly, you could find out which conference rooms are free the next hour for your unplanned brainstorm session. Virtual assistants could even be integrated with ERP or CRM applications so that your employees could ask “Alexa, ask our ERP app how many laptops we have in stock?”
So, throughout this blog, three virtual assistants have been consistently mentioned: Amazon’s Alexa, Apple’s Siri and Google’s Google Assistant (how creative). Albeit being produced by different manufacturers, all assistants follow the same pipeline from voice command to (voice) answer: wake word and command, ASR, NLP, finding action to command, and (possibly) TTS to return the correct audio. Nevertheless, subtle differences are discernible between these assistants.
It all comes down to Pizza
For example, both Alexa and Google Assistant offer large-scale third-party skills compatibility, meaning that your virtual assistant can learn much more than was designed upon production. Through this feature, it is possible to ask: “Alexa ask Dominos to deliver a large Pizza Chicken Supreme”, and get a pizza delivered to your doorstep. By contrast, Siri supports only a few third-party skills.
Siri’s limited third-party compatibility seems to fit in some broader Apple strategy of limiting Siri support to Apple products and software as much as possible. As a matter of fact, the only smart speaker compatible with Siri is Apple’s own HomePod on which you can only play music with Apple Music and not Spotify or Deezer. Both Alexa and Google Assistant are much more flexible in that regard. However, Alexa is similarly shrewd when she rather recommends an Amazon product than the cheapest one available like you asked.
One Assistant to Rule Them All
Lastly, a performance-based distinction can be made simply by comparing which virtual assistant attempts to answer most questions and how many answers were correct. In a large study by Stone Temple, issuing nearly 5000 commands to all three virtual assistants, it was Google Assistant that performed best. Google Assistant attempts to answer most questions (instead of saying “I’m sorry I didn’t understand that.”) and answered the most question fully and correctly. Alexa and Siri performed significantly worse on both measures.
Man’s new best friend?
It might be hard to fathom that the assistant that wouldn’t understand you before, suddenly has become your best mate. Or that Googling a problem is already considered ancient among the newest generation. Virtual assistants are the new deal and they are here to stay. Because just like in real life, they’ll understand you better and better the more you talk to them and the more you build up a relationship. Maybe next time, before you tell your partner a good night, you will also say “Alexa, good night!”.