Principled Frameworks for Designing Deep Learning Models: Efficiency, Robustness, and Expressivity
Dr. Tan Nguyen is currently an Assistant Professor of Mathematics at the National University of Singapore (NUS). Before joining NUS, he was a postdoctoral scholar in the Department of Mathematics at the University of California, Los Angeles, working with Dr. Stanley J. Osher. He obtained his Ph.D. in Machine Learning from Rice University, where he was advised by Dr. Richard G. Baraniuk. Dr. Nguyen is an organizer of the 1st Workshop on Integration of Deep Neural Models and Differential Equations at ICLR 2020. He also had two awesome long internships with Amazon AI and NVIDIA Research. He is the recipient of the prestigious Computing Innovation Postdoctoral Fellowship (CIFellows) from the Computing Research Association (CRA), the NSF Graduate Research Fellowship, and the IGERT Neuroengineering Traineeship. He received his M.S. and B.S. in Electrical and Computer Engineering from Rice University in May 2018 and May 2014, respectively.
Designing deep learning models for practical applications, including those in computer vision, natural language processing, and mathematical modeling, is an art that often involves an expensive search over candidate architectures. In this talk, I present novel frameworks to facilitate the process of designing efficient and robust deep learning models with better expressivity via three principled approaches: optimization, differential equation, and statistical modeling. From an optimization viewpoint, I leverage the continuous limit of the classical momentum accelerated gradient descent to improve Neural ODEs training and inference. The resulting Momentum Neural ODEs accelerate both forward and backward ODE solvers, as well as alleviate the vanishing gradient problem (Efficiency). From a differential equation approach, I present a random walk interpretation of graph neural networks (GNNs), revealing a potentially inevitable over-smoothing phenomenon. Based on this random walk viewpoint of GNNs, I then propose the graph neural diffusion with a source term (GRAND++) that overcomes the over-smoothing issue and achieves better accuracy in low-labeling rate regimes (Robustness). Using statistical modeling as a tool, I show that the attention in transformer models can be derived from solving a nonparametric kernel regression problem. I then propose the FourierFormer, a new class of transformers in which the softmax kernels are replaced by the novel generalized Fourier integral kernels. The generalized Fourier integral kernels can automatically capture the dependency of the features of data and remove the need to tune the covariance matrix (Expressivity).