Home / Blog / Going beyond voice recognition

Going beyond voice recognition

The graphic represents the main causes of dysarthria, the diffuculty in articulating our own voice in an intelligible way. The main cause by far is the result of a stroke, followed by Traumatic Brain Injury, Autism, Cerebral Palsy, Parkinson and Brain Cancer. Image credit: Voiceitt

Sometimes technology is wonderful, in a deep human sense. It can liberate our emotion and connect us with others (watch the clip)..

This is the case of the technology used by an Israeli company, Voiceitt, that started five years ago to bring to market (first as prototypes and now more and more as product) a system that can recognise “unrecognisable” voice.

Voice recognition technology made amazing progress in this decade, reaching a point where it can be used everyday, in many application, to be a commodity. I started to talk to my car and to my television, actually, this has become the main interaction channel for connecting to them.

Gone are the long traning session to let the software understand my voice. This has been the result of increased processing capacity and increased storage availability that can be used to detect a broad spectrum of languages and voices. A voice recognition application has so much data embedded that it can match my voice to correct sentences.

However, a significant portion of people, up to 7% according to Voiceitt, has problem in communicating by voice, having very little control on their vocal box (mouth, tongue, larynx) as result of acquired deficiencies (brain injuries) or congenital conditions. That translates into hundreds of million of people that have difficulties in making themselves understood.

Voiceitt got back to the early stage of voice recognition by developing a software that can be trained to understand voice, in this case when voice is difficult to be understood by our ear and brain. Their system is speaker dependent but language independent, meaning that it can be applied to any language. The crucial part is the speaker dependency but there is no way out here since the software cannot leverage on a common base of voices. It has to learn that specific one.

This is done through a correlation process where both the person with voice disability and the persons that over the years have learnt to understand him work together to teach the software the meaning of every single utterance. The voice recognition algorithms had to be modified since in these situation it is often the case that a person will be using the same utterances to actually mean different things, Hence, each utterance has to be placed in a specific context to acquire a unique meaning and be translated into speech.

I can imagine home systems like Alexa becoming host for this software and getting smarter and smarter over time, first by letting the person to interact with the system itself and then to use the system as a translator.  Having the possibility to train the software and embed that software into a smart phone to let the person communicate over a telecom infrastructure would be wonderful. We have had in the past a number of researches for real time translation applied to a telephone call where you can speak in your language and your correspondent will hear you in her language. This is now becoming part of the standard offering of several communications services, like Skype or other apps. They have in common a large data set and use machine learning applied to your voice to get better and better as you use them (by the way, this also leads to a lock in since the more you use a service the better it becomes and the more unlikely you will switch to a competitive service).

The situation is different for Voiceitt since in a sense you have to start from zero for any new user. Nevertheless, these are the users that needs these kind of support most.

About Roberto Saracco

Roberto Saracco fell in love with technology and its implications long time ago. His background is in math and computer science. Until April 2017 he led the EIT Digital Italian Node and then was head of the Industrial Doctoral School of EIT Digital up to September 2018. Previously, up to December 2011 he was the Director of the Telecom Italia Future Centre in Venice, looking at the interplay of technology evolution, economics and society. At the turn of the century he led a World Bank-Infodev project to stimulate entrepreneurship in Latin America. He is a senior member of IEEE where he leads the Industry Advisory Board within the Future Directions Committee and co-chairs the Digital Reality Initiative. He teaches a Master course on Technology Forecasting and Market impact at the University of Trento. He has published over 100 papers in journals and magazines and 14 books.