MASc Seminar Notice - Ali Hossein Abbasi Abyaneh

Tuesday, January 11, 2022 12:00 pm - 12:00 pm EST (GMT -05:00)

Candidate: Ali Hossein Abbasi Abyaneh
Title: Multi-agent Learning for Cooperative Scheduling of Microsecond-scale Services at Rack Scale

Date: January 11, 2022
Time: 12:00
Place: online
Supervisor(s): Zahedi, Seyed Majid

Abstract:


We consider the load-balancing problem in dense racks running microsecond-scale services.  In such a system, the scheduler needs
to make millions of scheduling decisions per second. Achieving this throughput while providing microsecond-scale tail latency and
high availability is extremely challenging. To address this challenge, we design a fully distributed load-balancing framework. In
this framework, servers cooperatively balance the load in the system. We model the interactions among servers as a cooperative
stochastic game. In this game, servers make scheduling decisions upon receiving and completing tasks. We propose a distributed
multi-agent learning algorithm to find the game’s parametric Nash equilibrium. Our proposed algorithm enables servers to make
scheduling decisions in tens of nanoseconds based on (possibly outdated) estimates of the load on other servers. We implement and
deploy our distributed load-balancing algorithm on a rack-scale computer with 264 physical cores. Our proposed solution provides
up to 20% more throughput at low tail latency than widely used load balancing policies.