Gaurav Gupta, Master’s candidate
David R. Cheriton School of Computer Science
We propose a mechanism for achieving cooperation and communication in Multi-Agent Reinforcement Learning (MARL) settings by intrinsically rewarding agents for obeying the commands of other agents. At every timestep, agents exchange commands through a cheap-talk channel. During the following timestep, agents are rewarded both for taking actions that conform to commands received as well as for giving successful commands. We refer to this approach as obedience-based learning.