First it was Seen, Now it is Heard

The Future of Voice

By Malte Kosub June 19th, 2018

https://s3.eu-central-1.amazonaws.com/com.h-farm.h-ive.prod/articles/337f5410-cad1-4799-ba85-917f71195323.jpg

— Can you imagine a device you never have to touch? One with no clicks but, instead, conversation? By the year 2020, you will have to imagine no longer. In fact, according to a study carried on by Gartner, in just two-years more than 30% of all web-browsing sessions will be done without a screen. The new interface? Voice.

Since the invention of the computer, human-machine interaction has always been conveyed through the physical: in the beginning, the message was transmitted by a piece of hardware (a mouse, keyboard, or joystick). Then, when the smartphone revolution arrived circa 2007, “touchscreen” technology moved interaction further in favour of touch. This is an unnatural form of interaction, an artificial and univocal language that has been designed specifically for a single purpose.


However, this physical barrier that translates human language into an input that machines understand has a sell-by date, and a sudden change is rapidly underway. Voice Assistance, a technology that makes sci-fi human-to-robot conversations real, is the interface of the future.


As we all know, Artificial Intelligence has made great leaps in recent years, opening up possibilities that were inconceivable just a decade ago. Voice assistance is among the applications that have greatly benefited from this exponential growth. The software combines AI’s machine learning with voice recognition technology, and it is a software agent that performs a number of tasks when activated via voice. Without machine learning, the development of a speech-recognition engine is almost impossible.


However thanks to these developments, today it is possible — and it is causing a shift in the way we interact with machines. Before, the burden of communication was on us, and we had to learn how to interact with machines. Now, it’s the other way around, and it is machines who are having to learn human language.


Apple’s Siri, released in 2011, was the first voice assistant software to enter the market. Afterwards, this market grew so large it caught the attention of all of the other major tech companies, who consequently invested in building their own voice software (Alexa by Amazon, Cortana by Microsoft and Google Assistant by Google), which all have been implemented into smart speakers as well as smartphones.


Onstage at I/O 2018, Google showed off a jaw-dropping new capability of Google Assistant: it’s going to make phone calls on your behalf.


The speakers, for example, are becoming very common in American and European households. Research suggests that 20% of the US population has access to smart speakers. Alexa and Echo Dots are stand-alone, non-battery powered, WIFI enabled speakers, mainly intended for domestic use. It may be the naturalness of voice that has enhanced the adoption rates of voice assistants — along with the need of controlling our devices hands-free. In any case, we have witnessed a 'hyper-adoption' of voice assistants already. This is what my partners and I anticipated when we founded Future Of Voice, a company that develops voice interfaces and works with tech giants such as Amazon, Google and Microsoft.


There is murmurs in the news about concerns that these devices are always switched on. However, this does not mean that they’re “listening” at any given time. They are activated with voice and only begin to send the recorded audio file to the cloud when you complete a preset command such as “Alexa” or “Hey, Google”. Otherwise the audiofile of the last few seconds is automatically deleted. In this moment in the digital era in which privacy issues are becoming a growing public concern, this is an image-problem that needs to be dealt with. However, tech giants will do all they can to ensure user privacy. If they do not, then they will see their voice assistants silenced for good.


Will voice assistants change the way we interact with each other? This is currently too difficult of a question to answer. One point of interest we have noticed is that this software gives better feedback when addressed “in a rude way” (more directly). Of course, nobody wants their children to be rude, but currently voice assistants do not require any “please”, or “thank you” – it isn’t essential information that affects their understanding. But this is just food for thought, and as this technology is in its infancy we do not yet fully know what users might want from this technology in the future.


Even though there is still room for improvement, especially when it comes to contextual understanding, Voice Assistance technology is undergoing a “silent revolution”. This doesn’t mean screens will completely disappear, at least not in the next decade. Graphical interfaces will certainly still be present as sight is just as important as voice when it comes to humans interacting with the world. But I am positive that as the argument for this new interface gains momentum, eventually it will become the only voice in the room.

The end.

More from NATURAL LANGUAGE PROCESSING