Seminar by Jinbo Chen

Tuesday, April 2, 2024 4:00 pm - 5:00 pm EDT (GMT -04:00)

Statistics and Biostatistics seminar series

Jinbo Chen
University of Pennsylvania

Room: M3 3127


Novel Missing Data Problems Arising from the Analyses of Electronic Health Record Data

Electronic Health Records (EHRs) contain a wealth of health information about patients, thereby offering rich opportunities for healthcare research. I will discuss two analytical challenges that my group has been addressing recently for analyzing EHR data. The first concerns the analysis of case-control studies assembled from EHRs, where the pool of cases is contaminated by patients who are ineligible for the study. These ineligible patients should have been excluded from the analyses if known. However, the true outcome status of a patient in the case pool is unknown except in a subset whose size may be arbitrarily small compared to the entire pool. The second challenge involves leveraging EHR data to identify patients who have missed diagnoses.

Under-diagnosis has been widely recognized for many diseases due to a multitude of reasons, and the extent of under-diagnosis may vary across population subgroups. Electronic Health Records (EHRs) contain health information for both diagnosed and under-diagnosed patients, therefore providing a unique opportunity to address under-diagnosis in the standard healthcare setting. However, such an opportunity to date has been far from being adequately exploited, partly because of the fundamental challenge that the under-diagnosed patients are mixed together with a large number of disease-free patients. Common to these two analytical challenges are the missing data problems, which cannot be readily addressed by existing missing data methodologies. I will discuss our current solutions and open problems.