Actuarial Science and Financial Mathematics seminar series
Link to join seminar: Hosted on Zoom
Reinforcement Learning in the Linear-Quaratic Framework: from Single-agent, to Multi-agent, and to Mean-field
Linear-quadratic (LQ) framework is widely studied in the literature of stochastic control, game theory, and mean-field analysis due to its simple structure, tractable solution, and local approximation power to nonlinear control problems. In this talk, we discuss several theoretical results of the policy gradient (PG) method, a popular reinforcement learning algorithm, for several LQ problems where agents are assumed to have limited information about the stochastic system. In the single-agent setting, we explain how the PG method is guaranteed to learn the global optimal policy. In the multi-agent setting, we show that (a modified) PG method could guide agents to find the Nash equilibrium solution provided there is a certain level of noise in the system. The noise can either come from the underlying dynamics or carefully designed explorations from the agents. Finally when the number of agents goes to infinity, we propose an exploration scheme with entropy regularization that could help each individual agent to explore the unknown system as well as the behavior of other agents. The proposed scheme is shown to be able to speed up and stabilize the learning procedure.
The numerical performance of PG methods is demonstrated with two examples, one is the optimal execution problem under the single-agent setting and the other one is the institutional negotiation/bargaining problem under the multi-agent setting.
This talk is based on several projects with Xin Guo (UC Berkeley), Ben Hambly (U of Oxford), Huining Yang (U of Oxford), and Thaleia Zariphopoulou (UT Austin).