Candidate: Haobei Song
Title: From Reinforcement Learning to Approximate Optimal Learning
Date: August 9, 2019
Place: EIT 3145
Supervisor(s): Tripunitara, Mahesh
Reinforcement learning framework gives no closed form analysis for exploration/exploitation dilemma. As a consequence, there is no general theory to explain data efficiency, which often impacts practical applications of reinforcement learning algorithms.
The exploration/exploitation dilemma is mostly dealt with in an ad hoc approach and the heuristics is hardly transferable among different problems. This thesis instead steps out of the conventional reinforcement learning framework and looks at a larger and more general set of problems, aka. optimal learning problems, in the hope that the exploration/exploitation dilemma can be addressed in theory either explicitly or implicitly.
Optimal learning frameworks can be constructed based on existing reinforcement learning frameworks. Three different optimal learning formulations are proposed to address the issues in the three different reinforcement learning frameworks.
Following such formulation, three classes of approximate optimal learning algorithms are proposed drawing from the following principles respectively:
(1) Sample from a pool of prediction neural networks as dynamics model;
(2) Approximate Bayesian inference rule using entangled prediction feed forward network and belief recurrent neural network;
(3) Use memory based recurrent neural network to extract features from observations.
Empirical evidence is provided to show the improvement of the algorithms used.
200 University Avenue West
Kitchener, ON N2L 3G1