SEMINAR

Ubiquitous 3D Vision in the Wild

Monday, Jun 26 2023 - 6:03 pm (GMT + 7)
Speaker
Minh Vo
Working
Spree3D
Timeline
Mon, Jul 03 2023 - 10:00 am (GMT + 7)
About Speaker

Minh Vo is the Head of Machine Learning at Spree3D, a high-tech virtual try-on startup. Minh leads a team of passionate researchers and engineers to strategically develop and commercialize our photorealistic avatar technology. Previously, Minh was a Senior Research Scientist at Facebook Reality Labs Research, where he tech-led a group of researchers to develop 3D perception and human sensing algorithms for Meta Aria glasses. Minh got his Ph.D. from The Robotics Institute, Carnegie Mellon University, where he worked with Prof. Srinivasa Narasimhan and Prof. Yaser Sheikh on novel methods to capture dense and accurate 3D shape of human bodies. His Ph.D. work was awarded the prestigious Qualcomm Innovation 2018 Fellowship.

Abstract

As cameras become ubiquitous, there is an increasing opportunity to reliably detect, reconstruct, and track in 3D the visual data in those footages for downstream applications, like surveillance, user intent understanding, or creative purposes. In this talk, we discuss our recent progress in 3D object scene understanding from three different platforms: a single infrastructure camera, a single wearable smart glasses, and collections of data captured by smartphone cameras. Firstly, we present Snipper, a novel framework to jointly detect, track, and forecast future human motion from an RGB video snippet. Despite its simplicity, Snipper outperforms many existing methods in stationary infrastructure camera tracking settings. Secondly, we present AriaHuman, the first large-scale 3D multi-human tracking benchmark acquired under different environments and activities from a smart glass setting, and a baseline method that takes advantage of the multiple cameras stream commonly available in the glass settings. Finally, we present BANMo, the first method to create 3D textured shape and estimate the motion of humans and animals in a unified manner from many casual videos that were not captured simultaneously.

Related seminars

Coming soon
Niranjan Balasubramanian

Stony Brook University

Towards Reliable Multi-step Reasoning in Question Answering
Fri, Nov 03 2023 - 10:00 am (GMT + 7)
Nghia Hoang

Washington State University

Robust Multivariate Time-Series Forecasting: Adversarial Attacks and Defense Mechanisms
Fri, Oct 27 2023 - 10:00 am (GMT + 7)
Jey Han Lau

University of Melbourne

Rumour and Disinformation Detection in Online Conversations
Thu, Sep 14 2023 - 10:00 am (GMT + 7)
Tan Nguyen

National University of Singapore

Principled Frameworks for Designing Deep Learning Models: Efficiency, Robustness, and Expressivity
Mon, Aug 28 2023 - 10:00 am (GMT + 7)