Jonathan Horrocks | Applied Math, University of Waterloo
Sparse Identification of Epidemiological Models from Empirical Data
Current modelling practices in mathematical epidemiology are predicated on mechanisms stemming from theoretical assumptions, such as mass action incidence. Deterministic disease models can describe many patterns observed in empirical incidence data but challenges remain in creating accurate, parsimonious models that offer predictive value. Recent advances in data-driven techniques give rise to new model discovery methods that forego theoretical assumptions and attempt to create sparse, dynamic models directly from real-world data. Our goal is to apply these techniques to empirical case notification data of epidemiological systems, to either confirm current practices or give new insight not accessible by human intuition.
We adapt a recently developed technique called Sparse Identification of Nonlinear Dynamics (SINDy), which has demonstrated ability to recover governing equations of complex dynamical systems. To lend insight into this process, the SINDy algorithm was first applied to simulated data from various forms of the SIR model, a standard compartmental model of epidemics. Several conversion processes were then utilized to recover both the susceptible and infectious classes from raw incidence data. Finally, the SINDy algorithm was applied to empirical data from measles, varicella, and rubella datasets, three diseases that offer contrasting dynamic behaviour, and the resulting time-series and model coefficients were analysed.
The resulting models closely mimic the dynamics of the empirical data, most notably the frequency of epidemics, for all three diseases considered. The coefficients discovered exhibit sparsity, though not to the extent that current compartmental models do. Similarities between the discovered model equations and fitted SIR models can be noted, including a strong dependence on the cross-term corresponding with the mass action incidence mechanism. These encouraging results indicate this data-driven technique may be of use in verifying and improving current theoretical models in mathematical epidemiology.