Speaker: Milen Pavlov
When applying reinforcement learning to real world problems it is desirable to make use of any prior information to speed up the learning process. Unfortunately, there does not exist any framework for encoding prior knowledge in RL nor developing general algorithms that make use of it. The state of the art consists of designing algorithms that employ ad-hoc shortcuts, which often rely on problem-specific assumptions. As a result, it is hard to generalize these approaches to many problems. In addition, prior information is often only used implicitly through those assumptions, which makes it difficult to validate the correctness of the prior information. We propose a new framework, called Global Reinforcement Learning, to facilitate the encoding and exploitation of prior knowledge in a general and principled way. In many RL problems, the learner often has some prior belief about what actions are good in some states, as well as the states that are likely to be reached as a result of the execution of some actions. We employ a Bayesian approach, whereby a wide range of beliefs can be encoded as distributions over policies and environments ( e.g., transition dynamics and reward functions). More precisely, we explain how to come up with a joint distribution over policies and environments by specifying constraints reflecting prior knowledge and taking into account the intuition that a policy-environment pair should have high probability when the policy achieves high reward in the associated environment. We also explain how to resolve inconsistencies in the prior knowledge used to specify this joint distribution. Once a prior distribution is obtained, it can be updated based on the trajectories of states and actions observed, using Bayesian RL techniques. The overall approach allows us to specify prior information about the environment and learn about the policy, or to specify prior information about the policy and learn about the environment, or to specify partial information about both and learn about both. Hence the name Global Reinforcement Learning.
Food: Saaed Hassanpour