Marty Mukherjee | Applied Mathematics, University of Waterloo
Methods and applications of reinforcement learning with partial differential equations
There has been a growing interest in Reinforcement Learning (RL) for continuous control problems in recent years. Since the advent of REINFORCE with Baseline, the value network continues to be used in state-of-the-art RL algorithms today. The Hamilton-Jacobi-Bellman (HJB) equation is a partial differential equation (PDE) used in control theory to evaluate the optimality of the value function. We introduce RL algorithms that encode the HJB equation into the value function using Physics-Informed Neural Networks (PINN). These algorithms show an improved performance compared to Proximal Policy Optimization (PPO) on the MuJoCo environments. It is also extended to a 1D PDE control problem that stabilizes the temperature of a battery pack using a cooling fluid.
For future work, I aim to further explore the intersection between different types of PDEs and RL. In an ongoing project, I aim to use elliptic PDEs to lessen the problem of sparsity in reward and cost functions that are commonly observed in constrained RL. I also intend to use RL to improve the training of physical models characterized by PDEs.