MASc Seminar: Stackelberg Multi-Agent Reinforcement Learning for Hierarchical Environments

Candidate: Sahil Pereira

Title: Stackelberg Multi-Agent Reinforcement Learning for Hierarchical Environments

Date: October 7, 2019

Time: 2:00pm

Place: EIt 3142

Supervisor(s): Crowley, Mark

Abstract:

We explore multi-agent domains containing asymmetric information between agents resulting in a hierarchy of leaders and followers. Leaders are agents that have access to follower agent policies and the ability to commit to an action before the followers. Followers can observe actions taken by leaders and respond to maximize their own payoffs. Since leaders know the follower policies, they can manipulate the followers to elicit a better payoff for themselves. In this paper, we focus on the problem of training agents in a multi-agent setting with continuous actions at different levels of hierarchy to obtain the best payoffs at their given positions. To address this problem we propose a new algorithm, Stackelberg Multi-Agent Reinforcement Learning (SMARL) that incorporates the Stackelberg equilibrium concept into the multi-agent deep deterministic policy gradient

(MADDPG) algorithm, for training agents at different levels. Since maximization over a continuous action space is usually intractable, we propose a method to solve the Stackelberg formulation for continuous actions using conditional actions and gradient descent. We evaluate our algorithm on multiple mixed cooperative and competitive multi-agent environments and the agents trained using our method show promising results in hierarchical domains.

MASc Seminar: Stackelberg Multi-Agent Reinforcement Learning for Hierarchical Environments

Support Waterloo Engineering