The future is surely going to be based on natural language interaction for most “daily” activity, including plenty of business related activities. Youngsters are already used to talk to their apps. Take a messaging app like WhatsApp: as of March 2022 this text messaging app is handling over 2 billion! minutes of voice and video messages a day.
Talking to a voice responder has become an everyday experience. As shown in the graphic what is actually going on is that we talk to a Voicebot that can analyse our voice message, convert it into a text and send it over to a software analysing what we want. Once the meaning of our sentence is understood the software (a chatbot) search for an answer and comes back to us, again in what is perceived as natural language, through a voice synthesiser.
With most advanced systems in use today it looks like we are talking to a real human.
Don’t think that is the end of it. We already have software that can identify our voice (our voice has some unique characteristics distinguishing it from any other people voice). By coupling this “voice identifier” in the system there is the possibility of customising the conversation. Basically the system will keep a record of previous interactions and will base its answer on the previous interactions. It can go like this:
I am feeling a bit sick today
Ehi Roberto! Sorry to hear that. Last week you seemed to be fine. Did you get those prescription I suggested? Remember the ones to keep your GERD at bay?
Yes I was fine, and I took the tablets. But today I feel a bit dizzy …
This is the sort of conversation you might have with your doctor once you see her. It can be replicated, automatically, on -line thanks to voice identification (as soon as I voiced the first sentence the system can pick me up and connect to my health data and to the record of the previous interactions).
There is no question that such a system would increase the quality of the interaction and there are already a few companies at work, particularly in the customer care area.
At the same time we are seeing the development of software (also AI based) that can mimic our voice o the point that a person knowing you can be fooled and believe he is actually talking to you (or you were the person talking). This is the dark world of deep fake.
The use of voice deep fake can be (sort of) fun as you can see in the clip, but it can raise serious issues. Using voice deep fake another person can pretend to be you in an interaction with a chatbot, provide false information and get your private information by steering the conversation is certain ways.
I feel this is an issue we will need to face in the coming years as we are moving towards natural language voice interactions. Some safeguards will be required, in the same way that we go through (usually boring and annoying) identification procedures when engaging in some sensitive interaction from our devices.