Student seminar series
Hongda
Hu Link to join seminar: Hosted on Microsoft Teams |
Risk-aware
multiarmed
bandit
algorithm
Multi-armed bandit (MAB) is a type of online learning and sequential decision-making problem that occurs when the underlying models are unknown. The classic MAB problem focuses on purely maximizing the expected reward. In my paper, I consider the risk and analyzed the MAB problem under the mean-variance setting. The majority of MAB literature assumes independent arms and only pulls one arm pulled at each round. In my paper, I drop the independence assumption. The learner is allowed to pull multiple arms every time to analyze the correlations among arms. The risk-aware multiarmed bandit algorithm (RAMAB) is proposed in the paper, and I theoretically proved the proposed algorithm achieved logarithmic learning regret. Numerically, I show that our proposed algorithm performs well as compared to the benchmark.