My current research focuses on model uncertainty in model-based reinforcement learning (RL). An internal environment model is a potentially powerful tool for prediction and planning, but planning with a biased or imperfect model can compromise an RL agent's policy. Tracking model uncertainty in a Bayesian manner by learning an approximate posterior distribution over possible models makes it possible for an RL agent to isolate the parts of its model(s) that it can trust for prediction and planning, and direct its exploration of the environment with the aim of reducing model uncertainty. My current goal is to use a deep probabilistic architecture for modeling uncertainty which can be scaled up to high-dimensional domains, in which an RL agent may only have enough computational power to maintain uncertainty over certain parts of its environment. Promising architectures include normalizing flows and sum-product networks, which allow for efficient and scalable inference, sampling, and learning. Given an expressive model for uncertainty, Bayesian RL seeks to optimize exploration vs. exploitation by simply acting greedily in the augmented MDP which incorporates the belief space (parameter space of the approximate posterior). I am interested in approximating Bayesian RL in a scalable way, by learning to explore in an adaptive subspace of the full belief space in high-dimensional domains.
I am also working on a project with the goal of understanding how hidden layers in deep convolutional neural networks (CNNs) are affected by adversarial attacks. The network's attention to parts of the input image can be visualized clearly by selectively backpropagating only positive gradients. Some preliminary work has shown that (i) adversarial attacks often divert the VGG network's attention away from the class object in ImageNet images to peripheral features, and that (ii) the effects of these attacks can be localized to specific deeper layers in the CNN. By better understanding adversarial influence on activation patterns in hidden layers, it may be possible to correct adversarial misclassifications by adding feedback or recurrent connections, drawing inspiration from the human visual system's use of feedback connections to direct visual attention.