MASc Seminar: From Reinforcement Learning to Approximate Optimal Learning

Candidate: Haobei Song

Title: From Reinforcement Learning to Approximate Optimal Learning

Date: August 9, 2019

Time: 2:00PM

Place: EIT 3145

Supervisor(s): Tripunitara, Mahesh

Abstract:

Reinforcement learning framework gives no closed form analysis for exploration/exploitation dilemma. As a consequence, there is no general theory to explain data efficiency, which often impacts practical applications of reinforcement learning algorithms.

The exploration/exploitation dilemma is mostly dealt with in an ad hoc approach and the heuristics is hardly transferable among different problems. This thesis instead steps out of the conventional reinforcement learning framework and looks at a larger and more general set of problems, aka. optimal learning problems, in the hope that the exploration/exploitation dilemma can be addressed in theory either explicitly or implicitly.

Optimal learning frameworks can be constructed based on existing reinforcement learning frameworks. Three different optimal learning formulations are proposed to address the issues in the three different reinforcement learning frameworks.

Following such formulation, three classes of approximate optimal learning algorithms are proposed drawing from the following principles respectively:

(1) Sample from a pool of prediction neural networks as dynamics model;

(2) Approximate Bayesian inference rule using entangled prediction feed forward network and belief recurrent neural network;

(3) Use memory based recurrent neural network to extract features from observations.

Empirical evidence is provided to show the improvement of the algorithms used.

Support Waterloo Engineering