SEMINAR

Faster saddle-point optimization for solving large-scale Markov decision processes

Friday, Feb 21 2020 - 6:21 pm (GMT + 7)
Speaker
Joan Bas Serrano
Working
Universitat Pompeu Fabra
Timeline
Fri, Feb 21 2020 - 10:00 am (GMT + 7)
About Speaker

Joan Bas Serrano is a PhD student and teaching assistant in the Artificial Intelligence and Machine Learning group at Universitat Pompeu Fabra (UPF). Under the supervision of Gergely Neu, he’s currently working on theoretical reinforcement learning. He is mainly interested in the development of reinforcement learning algorithms with a solid mathematical basis and strong theoretical guarantees. His current research is based on the linear programming formulation of MDPs and the saddle-point optimization theory.

He obtained his bachelor’s in Physics at Universitat Autònoma de Barcelona in 2016 and his M.Sc. at UPF in 2017, cursing the program “Master in Intelligent and Interactive Systems”. He did his master thesis titled “ A generative model of user activity in the Integrated Learning Design Environment“ under the advice of Vicenç Gomez and Davinia Hernandez-Leo.

Abstract

We consider the problem of computing optimal policies in average-reward Markov decision processes. This classical problem can be formulated as a linear program directly amenable to saddle-point optimization methods, albeit with a number of variables that is linear in the number of states. To address this issue, recent work has considered a linearly relaxed version of the resulting saddle-point problem. Our work aims at achieving a better understanding of this relaxed optimization problem by characterizing the conditions necessary for convergence to the optimal policy, and designing an optimization algorithm enjoying fast convergence rates that are independent of the size of the state space. Notably, our characterization points out some potential issues with previous work.

Related seminars

Coming soon
Niranjan Balasubramanian

Stony Brook University

Towards Reliable Multi-step Reasoning in Question Answering
Fri, Nov 03 2023 - 10:00 am (GMT + 7)
Nghia Hoang

Washington State University

Robust Multivariate Time-Series Forecasting: Adversarial Attacks and Defense Mechanisms
Fri, Oct 27 2023 - 10:00 am (GMT + 7)
Jey Han Lau

University of Melbourne

Rumour and Disinformation Detection in Online Conversations
Thu, Sep 14 2023 - 10:00 am (GMT + 7)
Tan Nguyen

National University of Singapore

Principled Frameworks for Designing Deep Learning Models: Efficiency, Robustness, and Expressivity
Mon, Aug 28 2023 - 10:00 am (GMT + 7)