Master’s Thesis Presentation: Policy Extraction via Online Q-Value Distillation
Aman Jhunjhunwala, Master’s candidate
David R. Cheriton School of Computer Science
Recently, deep neural networks have been capable of solving complex control tasks in certain challenging environments. However, these deep learning policies continue to be hard to interpret, explain and verify, which limits their practical applicability. Decision Trees lend themselves well to explanation and verification tools but are not easy to train especially in an online fashion. The aim of this thesis is to explore online tree construction algorithms and demonstrate the technique and effectiveness of distilling reinforcement learning policies into a Bayesian tree structure.