Keynote Speakers

Matthias Niessner
Keynote Speech 1 - Tuesday 13th September 2022 09:00 (CEST) Virtual
The Revolution of Neural Rendering In this talk, I will present our research vision in how to create a photo-realistic digital replica of the real world, and how to make holograms become a reality. Eventually, I would like to see photos and videos evolve to become interactive, holographic content indistinguishable from the real world. Imagine taking such 3D photos to share with friends, family, or social media; the ability to fully record historical moments for future generations; or to provide content for upcoming augmented and virtual reality applications. AI-based approaches, such as generative neural networks, are becoming more and more popular in this context since they have the potential to transform existing image synthesis pipelines. I will specifically talk about an avenue towards neural rendering where we can retain the full control of a traditional graphics pipeline but at the same time exploit modern capabilities of deep learning, such as handling the imperfections of content from commodity 3D scans. While the capture and photo-realistic synthesis of imagery open up unbelievable possibilities for applications ranging from entertainment to communication industries, there are also important ethical considerations that must be kept in mind. Specifically, in the content of fabricated news (e.g., fake-news), it is critical to highlight and understand digitally-manipulated content. I believe that media forensics plays an important role in this area, both from an academic standpoint to better understand image and video manipulation, but even more importantly from a societal standpoint to create and raise awareness around the possibilities and moreover, to highlight potential avenues and solutions regarding trust of digital content. Biography Dr. Matthias Nießner is a Professor at the Technical University of Munich where he leads the Visual Computing Lab. Before, he was a Visiting Assistant Professor at Stanford University. Prof. Nießner’s research lies at the intersection of computer vision, graphics, and machine learning, where he is particularly interested in cutting-edge techniques for 3D reconstruction, semantic 3D scene understanding, video editing, and AI-driven video synthesis. In total, he has published over 70 academic publications, including 22 papers at the prestigious ACM Transactions on Graphics (SIGGRAPH / SIGGRAPH Asia) journal and 43 works at the leading vision conferences (CVPR, ECCV, ICCV); several of these works won best paper awards, including at SIGCHI’14, HPG’15, SPG’18, and the SIGGRAPH’16 Emerging Technologies Award for the best Live Demo. Prof. Nießner’s work enjoys wide media coverage, with many articles featured in main-stream media including the New York Times, Wall Street Journal, Spiegel, MIT Technological Review, and many more, and his was work led to several TV appearances such as on Jimmy Kimmel Live, where Prof. Nießner demonstrated the popular Face2Face technique; Prof. Nießner’s academic Youtube channel currently has over 5 million views. For his work, Prof. Nießner received several awards: he is a TUM-IAS Rudolph Moessbauer Fellow (2017 – ongoing), he won the Google Faculty Award for Machine Perception (2017), the Nvidia Professor Partnership Award (2018), as well as the prestigious ERC Starting Grant 2018 which comes with 1.500.000 Euro in research funding; in 2019, he received the Eurographics Young Researcher Award honoring the best upcoming graphics researcher in Europe. In addition to his academic impact, Prof. Nießner is a co-founder and director of Synthesia Inc., a brand-new startup backed by Marc Cuban, whose aim is to empower storytellers with cutting-edge AI-driven video synthesis.

Vincent Lepetit
Keynote Speech 2 - Tuesday 13th September 2022 15:30 (CEST) In Person
New Problems in 3D Object Pose Estimation 3D (or 6D, or 9D) pose estimation of objects from images has made tremendous progress over the recent past years, but is it really enough to answer all needs, especially for the industry? Requirements for training time and for labeled data are still often a deal breaker to transfer these developments to production. In the first part of this talk, I will present our recent work on dealing with new objects without specific training on them: How to detect them, how to estimate their 6D pose, how to track them. I will then present our work on creating accurate annotations of real images automatically for training and evaluating 3D algorithms: In particular, we recently developed a method based on the Monte Carlo Tree Search (MCTS) algorithm to retrieve CAD models for indoor scenes from noisy RGB-D scans without human input. Biography Vincent Lepetit is a Director of Research at ENPC ParisTech, France. Prior to this position, he was a full professor at the Institute for Computer Graphics and Vision, Graz University of Technology (TU Graz), Austria and before that, a senior researcher at CVLab, Ecole Polytechnique Federale de Lausanne (EPFL), Switzerland. He also still leads a research group as an associate professor. His current research focuses on 3D scene understanding, especially at trying to reduce the supervision needed by a system to learn new 3D objects and new 3D environments. In the past, he has worked on vision-based Augmented Reality, Machine Learning and Deep Learning, in particular, their application to 3D registration, 3D object pose estimation, feature point detection and description and geo-localization from images. He received the Koenderick “test-of-time” award at the European Conference on Computer Vision 2020 for “Brief: Binary Robust Independent Elementary Features”. He often serves as an area chair of major computer vision conferences (CVPR, ICCV, ECCV, ACCV, BMVC) and as an editor for the International Journal of Computer Vision (IJCV) and the Computer Vision and Image Understanding (CVIU) journal.

Otmar Hilliges
Keynote Speech 3 - Wednesday 14th September 2022 09:00 (CEST) In Person
Human-Centric 3D Computer Vision for Future AI Systems Future AI systems such as personalized healthcare robots, self-driving cars, and AR/VR-based telepresence systems, will only be safe, useful and widely adopted if they are able to interpret human pose, shape and appearance at levels rivaling our own; and if they can interact with us and the world in a human-like and natural fashion. This requires perceiving and analyzing human behavior from images. It also requires generation, control and synthesis of virtual humans. To this end we propose a novel representation of human pose, shape and appearance that combines the advantages of neural implicit surfaces with those of parametric body models: i) a continuous and resolution-independent surface representation that can capture highly detailed geometry and can naturally model topology changes, ii) coupled with the ease of use and generalization capabilities to unseen shapes and poses of polygonal mesh-based models. We also introduce algorithms to learn such representations without requiring manually specified skinning weights or other forms of direct supervision. We then discuss how to leverage this representation to reconstruct controllable avatars (full body, faces and more) directly from images, videos or short RGB-D sequences via differentiable rendering. Finally, to make 3D human avatars widely available, we will discuss work towards generative modeling of 3D virtual humans with diverse identities and shapes in arbitrary poses and of interactions with 3D objects in a physically plausible manner. Biography Otmar Hilliges is a Professor of Computer Science at ETH Zurich, where he leads the AIT lab (https://ait.ethz.ch) and serves as head of the Institute for Intelligent Interactive Systems (https://iis.ethz.ch). Otmar’s research is in spatio-temporal understanding of how humans move within and interact with the physical world. He researches algorithms, methods and representations for human- and interaction-centric understanding of our world from videos, images and other sensor data. He is interested in many different application domains such as Augmented and Virtual Reality, Human Robot Interaction and more. Prior to joining ETH, he was a Researcher at Microsoft Research Cambridge (2012-2013). His Diplom (equiv. MSc) in Computer Science is from Technische Universität München, Germany (2004) and his PhD in Computer Science from LMU München, Germany (2009). He spent two years as a postdoc at Microsoft Research Cambridge (2010-2012). He has published more than 100 peer-reviewed papers in the major venues on computer vision, computer graphics and HCI. 20+ patents have been filed in his name on a variety of subjects from surface reconstruction to AR/VR. Amongst other sources of funding, Otmar Hilliges is a recipient of the prestigious ERC starting grant and ERC consolidator grant.

Angjoo Kanazawa
Keynote Speech 4 - Thursday 15th September 2022 12:30 (CEST) Virtual
Towards 4D Reality Capture We live in a dynamic three dimensional world that is full of life, with agents like people who interact with each other and their environment in their daily life. But motion is not only restricted to people, but everywhere we see: in the leaves rusted by the breeze, a bird chirping in the tree, a passing cloud over the sun.. How can we perceive and capture this 4D world? In this talk, I will paint the ambitious goal of capturing the photorealistic 4D world in a casual manner, like from everyday smartphone videos. While this is quite a challenge, with recent advances in analysis-by-synthesis and 3D neural field representations like Neural Radiance Fields (NeRFs), we have come far in the ability to capture the static 3D world in a photorealistic manner. I will discuss the recent advancements that make static 3D capture practical, and then discuss the challenges thereby in capturing the dynamic, non-rigid world in the general case from a monocular capture setup. I will then discuss the progress that can be made on deformable objects with known kinematic structure and 3D poses, and recent advancements in 3D perception of people from videos that can come into play. Biography Angjoo Kanazawa is an Assistant Professor in the Department of Electrical Engineering and Computer Science at the University of California at Berkeley. Her research is at the intersection of Computer Vision, Computer Graphics, and Machine Learning, focusing on the visual perception of the dynamic 3D world behind everyday photographs and video. Previously, she was a research scientist at Google NYC with Noah Snavely, and prior to that she was a BAIR postdoc at UC Berkeley advised by Jitendra Malik, Alyosha Efros, and Trevor Darrell. She completed her PhD in Computer Science at the University of Maryland, College Park with her advisor David Jacobs. She also spent time at the Max Planck Institute for Intelligent Systems with Michael Black. She has been named a Rising Star in EECS and is a recipient of Anita Borg Memorial Scholarship, Best Paper Award in Eurographics 2016, Google Research Scholar Award 2021, and a Spark Fellow 2022. She also serves on the advisory board of Wonder Dynamics, whose goal is to utilize AI technologies to make VFX effects more accessible for indie filmmakers.

Lourdes Agapito
Keynote Speech 5 - Thursday 15th September 2022 09:30 (CEST) In Person
Learning 3D Representations of Shape and Deformations As humans we take the ability to perceive the dynamic world around us in three dimensions for granted. From an early age we can grasp an object by adapting our fingers to its 3D shape; or effortlessly navigate through a busy street. These tasks require some internal 3D representation of shape, deformations, and motion. Building algorithms that can emulate human 3D perception, using as input single images or video sequences taken with a consumer camera, has proved to be an extremely hard task. Machine learning solutions have faced the challenge of the scarcity of 3D annotations, encouraging important advances in weak and self-supervision. In this talk I will describe progress from early optimization-based solutions that captured sequence-specific 3D models with primitive representations of deformation, towards recent and more powerful 3D-aware neural representations that can learn the variation of shapes and textures across a category and be trained from 2D image supervision only. There has been very successful recent commercial uptake of this technology and I will show exciting applications to AI-driven video synthesis. Biography Lourdes Agapito holds the position of Professor of 3D Vision at the Department of Computer Science, University College London (UCL). Her research in computer vision has consistently focused on the inference of 3D information from single images or videos acquired from a moving camera. She received her BSc, MSc and PhD degrees from the Universidad Complutense de Madrid (Spain). In 1997 she joined the Robotics Research Group at the University of Oxford as an EU Marie Curie Postdoctoral Fellow. In 2001 she was appointed as Lecturer at the Department of Computer Science at Queen Mary University of London. From 2008 to 2014 she held an ERC Starting Grant funded by the European Research Council to focus on theoretical and practical aspects of deformable 3D reconstruction from monocular sequences. In 2013 she joined the Department of Computer Science at University College London and was promoted to full professor in 2015. Lourdes serves regularly as Area Chair for the top Computer Vision conferences (CVPR, ICCV, ECCV) was Program Chair for CVPR 2016 and will serve again for ICCV 2023. She was keynote speaker at ICRA 2017 and ICLR 2021. In 2017 she co-founded Synthesia, the London based synthetic media startup responsible for the AI technology behind the Malaria no More video campaign that saw David Beckham speak 9 different languages to call on world leaders to take action to defeat Malaria.

Yasutaka Furukawa
Keynote Speech 6 - Thursday 15th September 2022 15:30 (CEST) Virtual
Teaching a Computer to be an Architect I will present our recent work on structured geometry reconstruction and generation, which help architects with their workflows. For reconstruction, I will talk about vector floorplan reconstruction from scanned floorplan images or RGBD images acquired on-site: What the key insights were and how we changed the landscape of floorplan reconstruction in the last 5 years. For generation, I will talk about the graph-constrained floorplan generation work (House-GAN): How we fused a reconstruction technique with GAN to build the system. Lastly, I will share my views of how the relationships of structured reconstruction and generation (two once very distant fields) are changing recently. Biography Dr. Yasutaka Furukawa is an associate professor in the School of Computing Science at Simon Fraser University (SFU). Dr. Furukawa's group has made fundamental and practical contributions to 3D reconstruction algorithms, improved localization techniques, and computational architectural modeling. Their open-source software has been widely adopted by tech companies used in surprising applications such as 3D printing of turtle shells and archaeological reconstruction. Dr. Furukawa received the best student paper award at ECCV 2012, the NSF CAREER Award in 2015, CS-CAN Outstanding Young CS Researcher Award 2018, Google Faculty Research Awards in 2016, 2017, and 2018, and PAMI Longuet-Higgins prize in 2020.