Statistics and Biostatistics seminar series
Jun Young Park
University of Toronto
Room: M3 3127
Preparing good data for more reproducible science in multi-site neuroimaging studies
Neuroimaging data provide rich information about the human brain, including its anatomy and function. Since such data are often collected across multiple study sites, substantial and unwanted variations can arise due to differences in scanner types, acquisition parameters, and preprocessing pipelines, implying that an increased sample size does not necessarily guarantee higher reproducibility. Inspired by the “batch effect” problem in -omics research, several statistical methods have been proposed to produce “batch-free” datasets by harmonizing heterogeneous means and variances across sites. Yet, it remains unclear how to effectively account for heterogeneous covariance structures in high-dimensional neuroimaging data. In this talk, I will present statistical methods that leverage the unique characteristics of imaging data (e.g., high dimensionality, spatial dependence in cortical thickness, network structure in functional MRI) to construct parametric models for site-specific covariance and to develop scalable methods for practical use. Real-data applications demonstrate that the proposed approaches outperform existing methods, offering a practical path toward increased reproducibility.