Video Understanding: from Representation Learning to Open-World, Long-term Reasoning
Du Tran is a staff research scientist at Meta AI Research. He graduated with a Ph.D. in computer science from Dartmouth College and an M.S. in computer science from the University of Illinois at Urbana-Champaign, receiving the Dartmouth Presidential Fellowship and the Vietnam Education Fellowship. His research interests are in computer vision, machine learning, and computer graphics, with specific interests in video understanding, representation learning, and multimodal modeling.
Video understanding is one of the fundamental problems in computer vision with various applications, including autonomous vehicles, robot learning, and visual perception. Although we have witnessed multiple works in video understanding in the last few years, there are many more challenging video understanding problems that are still unsolved. In this talk, I will present some of our recent work in video understanding, including cross-modal self-supervised learning of video and audio representations and open-world instance segmentation. Finally, I will speculate on several potential future research directions in this area.