PhD Seminar • Artificial Intelligence • Distributional Reinforcement Learning with Monotonic Splines | Cheriton School of Computer Science

Wednesday, March 20, 2024 3:00 pm - 4:00 pm EDT (GMT -04:00)

Please note: This PhD seminar will take place in DC 2585.

Yudong Luo, PhD candidate
David R. Cheriton School of Computer Science

Supervisor: Professor Pascal Poupart

Distributional Reinforcement Learning (RL) differs from traditional RL by estimating the distribution over returns to capture the intrinsic uncertainty of MDPs. One key challenge in distributional RL lies in how to parameterize the quantile function when minimizing the Wasserstein metric of temporal differences. Existing algorithms use step functions or piecewise linear functions.

In this work, we propose to learn smooth continuous quantile functions represented by monotonic rational-quadratic splines, which also naturally solve the quantile crossing problem. Experiments in stochastic environments show that a dense estimation for quantile functions enhances distributional RL in terms of faster empirical convergence and higher rewards in most cases.

Location Information

Location Address: DC - William G. Davis Computer Research Centre
200 University Avenue West
DC 2585
Waterloo, ON, CA N2L 3G1

Location coordinates: