Department seminar by Emanuele Giorgi | Statistics and Actuarial Science

Thursday, November 25, 2021 4:00 pm - 4:00 pm EST (GMT -05:00)

Please Note: This seminar will be given online.

Statistics & Biostatistics seminar series

Link to join seminar: Hosted on Zoom

A model-based geostatistical framework for analysing malaria school survey data with missing residence locations

Background. School-based sampling has been used for decades in Sub Saharan Africa (SSA) to inform the targeted responses for malaria and several neglected tropical diseases. Standard model-based geostatistical (MBG) methods for mapping disease prevalence use the school location to model the spatial correlation structure of the data. In this paper, we question this assumption and develop a novel framework that allows us to account for the uncertainty in the location of the residence of the students.

Methods. We develop three school catchment area (SCA) models that assume different modes of travel to school of the students: walking only (W); walking and bicycling (WB); walking and use of motorized transport (WM). We propose a computationally efficient approximation of the spatial Gaussian field and a Monte-Carlo maximum likelihood method for parameter estimation.

We then compare the resulting predictive inferences of these models with two standard approaches, both of which use the school locations to model the spatial correlation in the data but make different use of the covariates: the SL model uses covariates values extracted at the school school location; the SLCA averages values of the covariate within a given radius centred at the school location.

Results. The size and the shape of SCAs showed moderate variation across the three travel scenarios. The predicted malaria prevalence was heterogeneous across the area for all five models. The prevalence ranged between 0.06% and 69.2% for the SL and SLCA models, and between 0.06% and 77.6%, 75.1%, 75.6% for the W, WB and WM models, respectively. The SL and SLCA models had 43.3% and 43.0% of the school going children living in areas with a predicted prevalence below 30%, respectively. For the W, WB and WM models, these were 44.3%, 44.6% and 43.5%, respectively. The models accounting for location uncertainty (W, WB and WM) showed stronger spatial heterogeneity in the predicted prevalence and int exceedance probabilities than the SL and SLCA models.

Conclusions. The proposed modelling framework allows to generate predictive maps of disease prevalence that account for the uncertainty in the location where exposure occurs. Accounting for location-uncertainty is essential for both prediction of disease prevalence and for unbiased estimation of the relationship between prevalence and spatially referenced explanatory variables.