MC
5501

## Speaker

Ali Kara, University of Michigan

## Title

Reinforcement Learning in Non-Markovian Environments under General Information Structures

## Abstract

For decision-making under uncertainty, typically only an ideal model is assumed, and the control design is based on this given model. However, in reality, the assumed model may not perfectly reflect the underlying dynamics, or there might not be an available mathematical model. To overcome this issue, one approach is to use the past data of perceived state, cost and control trajectories to learn the model or the optimal control functions directly, a method also known as reinforcement learning. The majority of the existing literature has focused on methods structured for systems where the underlying state process is Markovian and the state is fully observed. However, there are many practical settings where one works with data and does not know the possibly very complex structure under which the data is generated and tries to respond to the environment.

In this talk, I will present a convergence theorem for stochastic iterations, particularly focusing on Q-learning iterates, under a general, possibly non-Markovian, stochastic environment. I will then discuss applications of this result to the decision making problems where the agent's perceived state is a noisy version of some hidden Markov state process, i.e. partially observed MDPs, and when the agent keeps track of a finite memory of the perceived data. I will also discuss applications for a class of continuous-time controlled diffusion problems.