Tim
Tse,
Master’s
candidate
David
R.
Cheriton
School
of
Computer
Science
In this work, we propose a novel Bayesian-inspired model-based policy search algorithm for data efficient control. In contrast to other model-based approaches, our algorithm makes use of approximate Gaussian processes in the form of random Fourier features for fast online systems identification and computationally efficient posterior updates via rank one Cholesky updates. Furthermore, fast and tractable posterior updates permit policy optimization to leverage knowledge from posterior evolution tracking for a directed Bayesian approach to the exploration-exploitation dilemma.
To address the optimization formulation involving belief monitoring as well as the potentiality of a loss surface with zero gradients everywhere, we leverage a blackbox optimizer in the form of covariance matrix adaptation evolution strategy (CMA-ES). We test our algorithm on four challenging control tasks and report the superior data efficiency as well as the exploration capabilities of our model.