Location
MC 5417
Candidate
Joshua Joseph George | Applied Mathematics, University of Waterloo
Title
A Hamiltonian Systems Approach to Neural Network Optimization
Abstract
We propose and analyze structure-preserving methods for first-order optimization of Lipschitz smooth objectives by interpreting the dynamics as a dissipative Hamiltonian system, in which the model parameters evolve jointly with an auxiliary momentum variable. This formulation induces a natural energy dissipation mechanism that motivates the design of optimization algorithms that inherit a discrete energy decay property. We develop discrete-gradient (DG) methods that preserve an exact discrete-time energy decay property, ensuring monotone dissipation independent of stepsize. Building on this framework, we introduce variants which empirically reduce oscillations, improve runtime, and improve robustness to ill-conditioned problems.
To address the computational cost of the implicit DG methods, we propose semi-implicit discrete-gradient (SIDG) schemes obtained by linearizing the DG updates and incorporating curvature through L-BFGS inverse-Hessian approximations, which are used to efficiently solve the resulting linear systems. These schemes retain key structure-preserving properties while significantly reducing computational cost, yielding a practical balance between stability and efficiency. We establish monotone energy decay, boundedness of iterates, and sublinear convergence to first-order stationary points.
Numerical experiments on ill-conditioned least-squares problems, regularized logistic regression, physics-informed neural networks, and CIFAR-10 image classification demonstrate good performance despite ill-conditioning and competitive performance as compared to widely used optimizers such as ADAM, Stochastic gradient descent, and L-BFGS.