Department Seminar by Hongda Hu | Statistics and Actuarial Science

Please Note: This seminar will be given online.

Student seminar series

Hongda Hu
University of Waterloo

Link to join seminar: Hosted on Microsoft Teams

Risk-aware multiarmed bandit algorithm

Multi-armed bandit (MAB) is a type of online learning and sequential decision-making problem that occurs when the underlying models are unknown. The classic MAB problem focuses on purely maximizing the expected reward. In my paper, I consider the risk and analyzed the MAB problem under the mean-variance setting. The majority of MAB literature assumes independent arms and only pulls one arm pulled at each round. In my paper, I drop the independence assumption. The learner is allowed to pull multiple arms every time to analyze the correlations among arms. The risk-aware multiarmed bandit algorithm (RAMAB) is proposed in the paper, and I theoretically proved the proposed algorithm achieved logarithmic learning regret. Numerically, I show that our proposed algorithm performs well as compared to the benchmark.