To close this series of post on human to whatever interfaces I chose to look at multimodal interfaces for three reasons:
- we have been using, as human, multimodal interfaces throughout our evolution history;
- multimodal interfaces will be the future of interfaces, in my opinion;
- we are already using multimodal interfaces in a number of situations
Indeed, when we interact with one another, as well as when we interact with the ambient, we use multimodal interaction. We look (see), we talk (even when interacting with a dog), we touch and sometimes we smell. When interacting with a machine, technology has forced us to use one single channel, mostly our hands (with the sight used to monitor what is going on, like seeing the letters we are typing on the keyboard showing up on the screen), but more recently our voice. We have become so “adapted” to the way we interact with a machine that when an alternative comes up (like speaking to the car navigator rather than inputting letters) we may feel uneasy. In some cases we have even been told that a single interaction channel is more efficient, promoting focus, whether it is true or just marketing I do not know. I read, however, that pilots used to fly planes with the seat of their pants, feeling the acceleration and vibration on their body gave them important information on what was going on and how the plane responded to their commands. No longer so. Today pilots get information through the glass cockpit, they have lost the direct connection with the plane.
I have no doubt that multimodal interfaces, both as “input and output” will become the norm. As devices get smarter and equipped with a variety of sensors it will become natural to have a more articulated interaction space: voice, gesture, touch, sight. Interacting with a robot will be (scaringly perhaps) indistinguishable from interaction with another human. We will look at “its” expressions as we talk, we might take its arm to show how to move it (you can already -almost- interact with Sawyer in this way). Multimodal interactions will become a “must” for the interaction with “whatever”, i.e. with a smart environment. Our environment, be it the home, the office, a department store or a hospital room will consist of several smart objects that will have to coordinate the way they interact with us (and be flexible in the way we interact with them). Digital twins, flanking each object, and possibly you and me, will play a significant role in the management of the interaction. Whilst today our interaction with a machine is basically happening at a syntactic level (whilst the one with a person we are with, face to face, happens mostly at a semantic level) in the future the interactions will happen mostly at semantic level, that is what we would call smart, and these kinds of interactions exploit multimodal, multichannel interfaces.
Now for the last point: we are already using multimodal interaction today when AR is adopted as an interaction interface. AR can be a powerful means for interacting with the real world assisted by the cyberspace, as in the example shown in the clip of an operator in an assembly line. AR is providing an additional channel, a virtual one if you want, supplementing the physical channels of our senses.