SEMINAR

Variable Selection with Theoretical Guarantees on High-dimensional Data

Monday, Sep 26 2022 - 9:46 am (GMT + 7)
Speaker
Binh Nguyen
Working
Telecom Paris
Timeline
Fri, Sep 30 2022 - 10:00 am (GMT + 7)
About Speaker

Binh Nguyen is a postdoctoral researcher at Telecom Paris, France. He obtained his doctoral degree in statistics in Département de Mathématiques d’Orsay and INRIA, and a master degree in Data Science at Paris-Saclay University. His research interest are in high-dimension statistics, optimization, and more recently the application of optimal transport to structured prediction problems in machine learning.

Abstract

In many scientific applications, increasingly bigger datasets are being acquired to describe more accurately biological or physical phenomena. While the dimensionality of the resulting measures has increased, the number of samples available is often limited, due to physical or financial limits. Performing statistical inference in such high-dimensional setting remains a hard problem that suffers from the curse of dimensionality. In this talk, we will first go through an introduction on the knockoff filters, a recent advance in multivariate analysis that controls the False Discovery Rate (FDR) with limited distribution assumptions. We then present a method for aggregating several samplings to address knockoff filter’s randomness, one of the its major limitation. We provide non-asymptotic theoretical results on the aggregated knockoff, specifically guaranteed FDR control, which relies on usage of concentration inequalities. Furthermore, we extend the method, providing a version that can scale to extremely high dimensional regime. One of the key steps is to use randomized clustering to reduce the dimension to avoid the curse of dimensionality, and then to ensemble several runs to tame the bias from the selection of a fixed clustering. We show that our algorithms perform reasonably well in practical applications from life-sciences, such as neuroscience, medical imaging and genomics.

Related seminars

Coming soon
Niranjan Balasubramanian

Stony Brook University

Towards Reliable Multi-step Reasoning in Question Answering
Fri, Nov 03 2023 - 10:00 am (GMT + 7)
Nghia Hoang

Washington State University

Robust Multivariate Time-Series Forecasting: Adversarial Attacks and Defense Mechanisms
Fri, Oct 27 2023 - 10:00 am (GMT + 7)
Jey Han Lau

University of Melbourne

Rumour and Disinformation Detection in Online Conversations
Thu, Sep 14 2023 - 10:00 am (GMT + 7)
Tan Nguyen

National University of Singapore

Principled Frameworks for Designing Deep Learning Models: Efficiency, Robustness, and Expressivity
Mon, Aug 28 2023 - 10:00 am (GMT + 7)