Last week Amazon announced several products, if you missed it there are plenty of reports on the web (of course) like the one here. I read a few of them and I decided to add my 2 cents of comments, not on the announced products but on the trend that might emerge.
Something that is getting clear is that voice interface is now a reality. It works and it works pretty well. The progress in these last five years has been impressive. I don’t know how accurate voice recognition is today but I know that it is usable (there are still a few doubting it, watch the video).
Interestingly, voice interface is becoming customisable. Amazon lets you experience the voice of your favourite actor/actress for just 1$. You pay that tiny fee and the devices will be using that character voice to interact with you, so that you can experience the thrill of hearing Jennifer Aniston talking to you (I am not sure if that particular voice is available, but you get the gist). I don’t know if there exists a copyright on voice, so that Amazon has to pay the voice owners a royalty… nor if it has to ask for permission. Reading on the web I discovered that the voice as such is not copyrightable in the US, however since the voice can identify a person you have to clearly state that the voice is NOT the one of that specific person, i.e. you cannot use it to mislead a third party, For sure this opens up a can of worms and legislators will have to look into it. The problem with internet is the application of the legislation. Your national legislator may rule in a given way but will that apply to a voice that is generate in a cloud no-one knows exactly where?
I can also imagine some people might want to use the voice of a loved one who passed away to stay with them, even though just through an artificial interface…
Amazon has expanded its line of Echos but most intriguing it is offering a pair of glasses with an embedded mike and a ring with en embedded mike. That establishes a connection with your smartphone, or a home hub, that directly, or leveraging on the web, will make sense of the sounds captured.
I have read on some commentaries that the glasses do not have a camera (as the Google Glass did) and because of that they might be more socially acceptable. I am not completely sure of that. It might be the case today since many people are feeling awkward of their image being captured (although apparently they have no reservation in publishing tons of their photos on the web…) but seem to disregard ambient eavesdropping. I think this is going to change as awareness will mount that capturing your voice is as bad as capturing your image. In the same way we have technology for face identification we have technology for voice identification. Images can be decontextualised, likewise voice. There is really not that difference between having our voice highjacked by a (almost) hidden microphone or our image by a camera.
Another trend emerging is that the ambient becomes responsive. You have mikes potentially embedded everywhere, beginning with the stuff you wear (like the Amazon ring). These mikes can capture, seamlessly your voice and a software can respond accordingly. This is likely to reshape our perception of the space we live in, of our home, car and office. When you get responses you are likely to start perceiving a living being, particularly if AI can support a smart conversation, an ever changing one so that the same question will likely be answered differently every single time.
I remember many years ago I read an article describing the future of interfaces. It started with a little boy rummaging an old trunk and discovering a strange stuff, a flat plate with letters on it that you could push. It was a keyboard but it was the first time that boy was seeing one. They had long disappeared replaced by voice interaction. Well, that was describing a future, a future that is now almost present.