Home / Blog / The strange twist of an AI based video-communications

The strange twist of an AI based video-communications

A frame picked up by the video camera is processed by AI to create a model. This model is sent to the receiver where AI will reconstruct a “credible” image that mimics the original. This allows a dramatic decrease of bandwidth. Image credit: NVIDIA

All communications is based on coding. You talk into your smartphone and your voice is sampled, converted into bits that are transported by the network to your correspondent smartphone that will provide to convert them back into the sound of your voice. Well, it is not identical to your voice but it is so close to it that you can immediately recognise who is talking to you.

Likewise for images, when you send a picture or if you are engaged in a video communications.

NVIDIA has just announced, and demonstrated -watch the video- a new way of coding that relies on artificial intelligence. Still coding, but now the meaning of “original and copy” get some strange twist.

What happens is that the AI software looks at the images created by the video cam and understand what the image is. At that point it creates a semantic model, like there is a girl talking and this girl has this type of face. This information is sent to the receiver that will use it to reconstruct the face. Getting the associated sound, it will animate the face, lips movement, to be in synch with the sound. This leads to a dramatic reduction in the needed bandwidth, according to NVIDIA a factor of 10 decrease. If normal coding would require 100 kbps using AI based coding decreases the requirement to 10kbps. This is very good for those communications that have to face low bandwidth.

However, the situation created by AI coding opens the door to a completely new scenario. You might, as an example, use an image of yourself, as you were 10 years ago, still with black hair and no crow’s feet!  That would be cheating but just a bit… We started to cheat long ago with make up and more recently with Photoshop!

The next step could be using another person face and one of those programs that let you morph your voice in another person’s voice, like you are a male and you want to appear like a female. You may try this, as an example, using FakeVoice.  You can also “pretend” to be another person by using that person voice pattern. This is done using AI and it is a fun that quickly morphs into a “crime”.

NVIDIA is also enabling you to create and use an avatar, a cartoon character that will be animated by AI in synch with your voice (again watch the clip). This is fun and it does not have criminal implications.

All in all, technology is neither good nor bad, it all depends on the way it is used. The problem is that technology has become a powerful tool that can easily fool us. In addition, it moves our perception in a sort of limbo where reality and cyberspace morph into a single reality where the two can no longer be separated.

About Roberto Saracco

Roberto Saracco fell in love with technology and its implications long time ago. His background is in math and computer science. Until April 2017 he led the EIT Digital Italian Node and then was head of the Industrial Doctoral School of EIT Digital up to September 2018. Previously, up to December 2011 he was the Director of the Telecom Italia Future Centre in Venice, looking at the interplay of technology evolution, economics and society. At the turn of the century he led a World Bank-Infodev project to stimulate entrepreneurship in Latin America. He is a senior member of IEEE where he leads the Industry Advisory Board within the Future Directions Committee and co-chairs the Digital Reality Initiative. He teaches a Master course on Technology Forecasting and Market impact at the University of Trento. He has published over 100 papers in journals and magazines and 14 books.