Please note: This master’s thesis presentation will be given online.
Alexander Van de Kleut, Master’s candidate
David R. Cheriton School of Computer Science
Supervisor: Professor Jeff Orchard
We are interested in training goal-conditioned reinforcement learning agents to reach arbitrary goals specified as images. In order to make our agent fully general, we provide the agent with only images of the environment and the goal image. Prior methods in goal-conditioned reinforcement learning from images use a learned lower-dimensional representation of images. These learned latent representations are not necessary to solve a variety of goal-conditioned tasks from images. We show that a goal-conditioned reinforcement learning policy can be successfully trained end-to-end from pixels by using simple reward functions. In contrast to prior work, we demonstrate that using negative raw pixel distance as a reward function is a strong baseline. We also show that using the negative Euclidean distance between feature vectors produced by a random convolutional neural network outperforms learned latent representations like convolutional variational autoencoders.
To join this master’s thesis presentation on MS Teams, please go to https://teams.microsoft.com/l/meetup-join/19%3ameeting_YWI3ZDRjYjQtNGQxMy00NDlhLWJkYjAtYWM5MzJmYWMwZTA0%40thread.v2/0?context=%7b%22Tid%22%3a%22723a5a87-f39a-4a22-9247-3fc240c01396%22%2c%22Oid%22%3a%22d442c9e6-f947-49bb-bb0e-d3ba1359ded5%22%7d.
200 University Avenue West
Waterloo, ON N2L 3G1