SEMINAR

Learning Compositional Representations for Understanding and Generating Controllable 3D Environments

Friday, Jul 22 2022 - 4:57 pm (GMT + 7)
Speaker
Despoina Paschalidou
Working
Stanford University
Timeline
Fri, Aug 19 2022 - 11:00 am (GMT + 7)
About Speaker

Despoina Paschalidou is a Postdoctoral Researcher at Stanford University working with Prof. Leonidas Guibas at the Geometric Computation Group. She did her Ph.D. at the Max Planck ETH Center for Learning Systems, where she worked with Prof. Andreas Geiger and Prof. Luc van Gool. Prior to this, she was an undergraduate in the School of Electrical and Computer Engineering at the Aristotle University of Thessaloniki in Greece, where she worked with Prof. Anastasios Delopoulos and Christos Diou. Her interest in computer vision, particularly in the areas of interpretable shape representations, scene understanding and generative models and unsupervised deep learning.

Abstract

Within the first year of our life, we develop a common-sense understanding of the physical behavior of the world, which relies heavily on our ability to properly reason about the arrangement of objects in a scene. While this seems to be a fairly easy task for the human brain, computer vision algorithms struggle to form such high-level reasoning. Therefore, the research community shifted their attention to the development of primitive-based methods that seek to represent objects as semantically consistent part arrangements. However, due to the simplicity of existing primitive representations, these methods fail to accurately reconstruct 3D shapes using a small number of primitives/parts. In the first part of my talk, I will address the trade-off between reconstruction quality and number of parts and present Neural Parts, a novel 3D primitive representation that defines primitives using an Invertible Neural Network (INN) which implements homeomorphic mappings between a sphere and the target object. Since a homeomorphism does not impose any constraints on the primitive shape, our model effectively decouples geometric accuracy from parsimony and as a result captures complex geometries with an order of magnitude fewer primitives. In the second part of my talk, we will look into the problem of inferring and subsequently also generating semantically meaningful object arrangements to populate 3D scenes conditioned on the room shape. In particular, I will present ATISS, a novel autoregressive transformer architecture for creating diverse and plausible synthetic indoor environments as unordered sets of objects. Our unordered set formulation allows us to use the same trained model for a variety of interactive applications such general scene completion, partial room rearrangement with any objects specified by the user, as well as object suggestions for any partial room. This is an important step towards fully automatic content creation. Finally, we will look into 3D shape editing and manipulation. In particular, I will present two methods capable of generating plausible 3D shape variations with local control, that can be combined with ATISS to allow control both in scene level and object level.

Related seminars

Coming soon
Niranjan Balasubramanian

Stony Brook University

Towards Reliable Multi-step Reasoning in Question Answering
Fri, Nov 03 2023 - 10:00 am (GMT + 7)
Nghia Hoang

Washington State University

Robust Multivariate Time-Series Forecasting: Adversarial Attacks and Defense Mechanisms
Fri, Oct 27 2023 - 10:00 am (GMT + 7)
Jey Han Lau

University of Melbourne

Rumour and Disinformation Detection in Online Conversations
Thu, Sep 14 2023 - 10:00 am (GMT + 7)
Tan Nguyen

National University of Singapore

Principled Frameworks for Designing Deep Learning Models: Efficiency, Robustness, and Expressivity
Mon, Aug 28 2023 - 10:00 am (GMT + 7)