How do you cut a birthday cake with your friends if the coronavirus pandemic does not allow you to get close to each other?
That was the challenge that the national research institute for mathematics and computer science in the Netherlands, Centrum Wiskunde & Informatica (CWI), faced with professional cake designer Cake Researcher when CWI celebrated its 75th anniversary earlier this year.
Fortunately, CWI has an in-house specialist who solved that problem using virtual reality (VR): Pablo Cesar, a researcher in human-centered multimedia systems and leader of the Distributed and Interactive Systems group at CWI, who also is a professor and chair of Human-Centered Multimedia Systems at the Netherlands' Delft University of Technology (TU Delft). Cesar, named an ACM Distinguished Member in 2020, investigates how to improve the ways people use interactive systems to communicate with each other.
While we currently use interactive systems to communicate person to person via flat screens, it would be much more convenient for many applications to communicate via three-dimensional (3D) video, also called volumetric video. Ultimately, we might want to transfer high-quality 3D models of people anywhere in the world in real time, something that Microsoft calls holoportation.
Working on the path to holoportation, Cesar develops state-of-the-art technology for capturing and distributing volumetric video. He showed Bennie Mols around in CWI's two VR rooms. Surrounded by Kinect cameras standing on tripods and hanging from the ceiling, Cesar spoke about where the technology stands right now, and what the future holds.
How would you describe the point where we are at present with regard to distributing volumetric video?
We are now with volumetric video where we were with 2D video in the 1990s: we understand that it is possible, and we know more or less the path to explore. There is a lot of potential, but there is still a lot of work to be done.
Can you describe a concrete example from your research?
Cutting the virtual birthday cake earlier this year was state of the art of what is possible with volumetric video. Two people were immersed in a virtual world by wearing VR glasses, and had to cut a virtual birthday cake together. Each person and the birthday cake were recorded in separate rooms using three Kinect cameras each. The volumetric videos were combined in real time so that the two people got the feeling that they were really cutting the cake together. You can see the demonstration online.
How does this compare to what other labs are capable of?
Other labs might have a better video capturing system or a better rendering system; each lab has its own strong points. But in terms of the full pipeline to capture, transmit, and render volumetric video, my team is state of the art. Our goal is to make our technology open source available later this year or next year.
Do you have another example from your work that might have practical applications?
In another recent demonstration, we used a normal mobile phone camera instead of three Kinect cameras. Though not perfect, the mobile phone camera can also capture depth information.
On a nearby parking place, we simulated a cyclist having fallen off his bicycle, seriously injuring his knee. A chance passer-by starts a video conversation with a doctor using the public 5G mobile network from KPN. This was a premier, the first time that the public 5G network was used for volumetric videoconferencing.
We demonstrated that, based on the volumetric video she sees, the doctor can give advice to the passer-by on how to help the injured cyclist. This is a good example of the future of remote consultation, where professionals really need 3D video and data to make decisions, and not just 2D video.
What are the biggest challenges in your research?
One of the challenges is the real-time reconstruction of the 3D video information in the wild. How to we support all kind of configurations and cameras? How do we create an extensible and modular system?
Another challenge is to decide which information is absolutely necessary to send and which information might be left out for practical purposes. For example, in the case of cutting the virtual birthday cake, it was very important to send the faces and hands of the two people, but showing other parts of them was less important. This is done automatically, so you need to cleverly design your algorithms in order to transfer the minimum amount of data while keeping the maximum quality of experience. We take inspiration from the way humans perceive the world.
A third challenge is to build a quality monitoring system. For 2D video streaming, there are a plethora of validated quality metrics that are used to monitor the quality of experience and to optimize the distribution of the video to adapt to different network conditions. However, for 3D video, such metrics do not exist yet. Following user-centric methodologies, my team is developing a new set of metrics for volumetric video and videoconferencing. At the end of the day, the ultimate goal of our research is to provide a rich user experience that can adapt to the context of the communication.
What will be possible over the next 5-10 years?
The coronavirus pandemic has brought to the fore the importance of remote communication. I think that in the next decade, we can get quite far by exploring how volumetric videoconferencing can help in, for example, education, health care, work meetings, and cultural heritage.
In the domain of cultural heritage, we are presently developing a demonstration in which a group of friends or family members can virtually visit a museum together.
For all future applications, a lot will depend on the rendering side: glasses and head-mounted displays. If we get serious advances in this, we can make volumetric videoconference available for all within the next 10 years.
How realistic is holoportation, in which you can be virtually teleported to a place together with other people?
Many companies are seriously investing in this area. For example, Microsoft earlier this year released a video about Microsoft Mesh, and in May Google presented Google Starline. Facebook launched the open beta of Horizon in August, and has published really interesting articles on realistic avatars.
I think there is a consensus from industry and academia that holoportation and volumetric videoconferencing will play a pivotal role in the future of interpersonal communication.
Bennie Mols is a science and technology writer based in Amsterdam, the Netherlands.