Please note: This PhD defence will take place in DC 3317.
Joshua
Jung,
PhD
candidate
David
R.
Cheriton
School
of
Computer
Science
Supervisor: Professor Jesse Hoey
General game playing (GGP) is a field of reinforcement learning (RL) in which the rules of a game (i.e., the state and dynamics of an RL domain) are not specified until runtime. A GGP agent must therefore be able to play any possible game at an acceptable level given an initialization time on the order of seconds. This time restriction promotes generality, precludes the use of the deep learning methods that are popular in the RL literature, and has led to the widespread use of Monte Carlo Tree Search (MCTS) as a planning strategy. A typical MCTS planner builds a search tree from scratch for every new game, but this leaves usable information on the table. Over its full history of play, an agent may have previously encountered a similar game from which it could draw insights into its current challenge. However, recognizing similarity between games and effectively transferring knowledge from past experience is a non-trivial task.
In this thesis, we develop methods for automatically identifying similar features in two related games by finding an approximated edit distance between the graphs generated from their rules. We use that information to guide MCTS in one game with general heuristics initialized via transfer from a previously played game. Despite the computational cost of doing so, we show that the more efficient search granted by this approach can lead to better performance than either UCT (a standard method of MCTS) or a non-transfer MCTS agent with access to the same general heuristics. We examine the circumstances under which transfer is most effective, and also identify and create solutions for the cases where it is not.