Statistics and Biostatistics seminar series
Ehsan
Karim Room: M3 3127 |
Rethinking Residual Confounding Bias Reduction: Why Vanilla hdPS Alone is No Longer Enough in the Era of Machine Learning
Health studies that use administrative databases often lack complete information on confounders. On the other hand, a large number of additional diagnoses, procedures, and medication codes that are regularly recorded in healthcare encounters, are not used in epidemiological studies, due to their perceived lack of relevance to the study question. To address this residual confounding problem, researchers have developed the high-dimensional propensity score (hdPS) algorithm. This algorithm allows researchers to leverage this additional information as proxies for unmeasured and mis-measured covariates, which can help reduce residual confounding bias in the estimation of treatment effects. Since the hdPS algorithm deals with massive amounts of information, machine learning variable selection methods are proposed as an alternative. These methods have been shown to be effective in reducing bias, but it remains a challenge to estimate variance correctly in this context. Even doubly robust or targeted maximum likelihood estimators (TMLE) can struggle with this issue. To address this problem, we designed a simulation study to compare the performance of methods of the following categories: (1) vanilla hdPS, (2) machine learning and hybrid alternatives proposed in the literature, and (3) TMLE versions with two sets of candidate learners for super learning. We will evaluate these methods in terms of bias, variance (both model-based and empirical), and coverage. We will present a nationally representative analysis as a motivating example, explain how this study fits into the literature so far, and provide practical recommendations for practitioners based on our findings.