This article is based on an article from the Japanese edition of Engadget and was created using the translation tool Deepl.
NVIDIA announced its AI-based data compression technology for video conferencing at GPU Technology Conference 2020.
It reduces the amount of traffic to less than one-tenth of the standard H.264 compression, reduces block noise and other image quality degradation, even on slow lines, and even corrects off-camera eye contact and face-to-face eye contact.
NVIDIA's new AI Video Compression technology is optimized for video calling and conferencing and allows AI to learn human faces.
While traditional video conferencing software uses general-purpose video compression technology and sends the video in a sort of straightforward manner, NVIDIA's AI compression first recognizes the face in the camera footage locally and sends it to the other party as the original keyframe.
At the same time, multiple facial key points such as eyes, nose, and contour of each person on a video call are extracted, and after the keyframe, only the movements of these key points are transmitted. The receiver combines the movements of the key points in relation to the keyframe to "reconstruct" the movements and expressions of the face and can send a talking face with significantly less data than if the video were sent as is.
In short, it's a real-time applied version of the app that makes photos speak realistically and replaces faces. Keyframes are also fed in as needed, rather than just the first one, to prevent unnatural expressions after the reconstruction.
While it only "compresses" faces and general video conferencing footage, it can be used even when wearing masks, glasses, etc. According to NVIDIA, the technology can achieve dramatic data savings of 1/10 or 1/100th of the H.264 codec ratio.
Because the video itself doesn't need to be constantly transmitted, the technology also doesn't break up images with noises like typical general-purpose video compression, even on extremely slow lines.
In the example of the demo, the amount of data below the wobbly H.264 compression was able to show natural movement with little noise.
In essence, it's similar to performance capture, where the "model of the moving face" and the "facial expression capture data" are acquired at the same time, so it's possible to move the eyes and change the direction of the face during reconstruction.
In the example of the demo, it was modified from looking away from the camera because the user was gazing at the screen to looking straight at the camera, or the person on the other end of the line.
Similarly, it can be used as a technique to make an avatar or character speak by making the keyframe another image or 3D model instead of your own face.
NVIDIA's AI Video Compression technology will be available as part of NVIDIA Maxine, a cloud AI-based video streaming platform that combines other technologies such as super-resolution for low-resolution video, audio denoising, real-time translation, and virtual backgrounds.
This article is based on an article from the Japanese edition of Engadget and was created using the translation tool Deepl. The Japanese edition of Engadget does not guarantee the accuracy or reliability of this article.