Department seminarMufan Li Room: M3 3127 |
Infinite-Depth Neural Networks as Depthwise Stochastic Processes
Recent advances in neural network research have predominantly focused on infinite-width architectures, yet the complexities inherent in modelling networks with substantial depth call for a novel theoretical framework. In this presentation, we explore a unique approach to modelling neural networks using the proportional infinite-depth-and-width limit.
In fact, naively stacking non-linearities in deep networks leads to problematic degenerate behaviour at initialization. To address this challenge and achieve a well-behaved infinite-depth limit, we introduce a fundamentally novel framework: we treat neural networks as depthwise stochastic processes. Within this framework, the limit is characterized by a stochastic differential equation (SDE) that governs the feature covariance matrix. Notably, the framework we introduced leads to a very accurate model of finite size networks. Finally, we will briefly discuss several applications, including stabilizing gradients for Transformers, saving computational costs in hyperparameter tuning, and a new spectrum result for products of random matrices.