New developments in survival forests techniques
Survival analysis answers the question of when an event of interest will happen. It studies time-to-event data where the true time is only observed for some subjects and others are censored. Right-censoring is the most common form of censoring in survival data. Tree-based methods are versatile and useful tools for analyzing survival data with right-censoring. Survival forests, that are ensembles of trees for time-to-event data, are powerful methods and are popular among practitioners. Current implementations of survival forests have some limitations. First, most of them use the log-rank test as the splitting rule which loses power when the proportional hazards assumption is violated. Second, they work under the assumption that the event time and the censoring time are independent, given the covariates. Third, they do not provide dynamic predictions in presence of time-varying covariates. We propose solutions to these limitations: We suggest the use of the integrated absolute difference between the two children nodes survival functions as the splitting rule for settings where the proportionality assumption is violated. We propose two approaches to tackle the problem of dependent censoring with random forests. The first approach is to use a final estimate of the survival function that corrects for dependent censoring. The second one is to use a splitting rule which does not rely on the independent censoring assumption. Lastly, we make recommendations for different ways to obtain dynamic estimations of the hazard function with random forests with discrete-time survival data in presence of time-varying covariates. In our current work, we are developing forest for clustered survival data.
Analysis of Generalized Semiparametric Mixed Varying-Coefficient Effects Model for Longitudinal Data
The generalized semiparametric mixed varying-coefficient effects model for longitudinal data that can flexibly model different types of covariate effects. Different link functions can be selected to provide a rich family of models for longitudinal data. The mixed varying-coefficient effects model accommodates constant effects, time-varying effects, and covariate-varying effects. The time-varying effects are unspecified functions of time and the covariate-varying effects are nonparametric functions of a possibly time-dependent exposure variable. We develop the semiparametric estimation procedure by using local linear smoothing and profile weighted least squares estimation techniques. The method requires smoothing in two different and yet connected domains for time and the time-dependent exposure variable. The estimators of the nonparametric effects are obtained through aggregations to improve efficiency. The asymptotic properties are investigated for the estimators of both nonparametric and parametric effects. Some hypothesis tests are developed to examine the covariate effects. The finite sample properties of the proposed estimators and tests are examined through simulations with satisfactory performances. The proposed methods are used to analyze the ACTG 244 clinical trial to investigate the effects of antiretroviral treatment switching in HIV infected patients before and after developing the codon 215 mutation.
Quantile regression with nominated samples for more efficient and less expensive follow-up studies of bone mineral density
We develop a new methodology for analyzing upper and/or lower quantiles of the distribution of bone mineral density using quantile regression. Nomination sampling designs are used to obtain more representative samples from the tails of the underlying distribution. We propose new check functions to incorporate the rank information of nominated samples in the estimation process. Also, we provide an alternative approach that translates estimation problems with nominated samples to corresponding problems under simple random sampling (SRS). Strategies are given to choose proper nomination sampling designs for a given population quantile. We implement our results to a large cohort study in Manitoba to analyze quantiles of bone mineral density using available covariates. We show that in some cases, methods based on nomination sampling designs require about one tenth of the sample used in SRS to estimate the lower or upper tail conditional quantiles with comparable mean squared errors. This is a dramatic reduction in time and cost compared with the usual SRS approach.
Probability models for discretization uncertainty with adaptive grid designs for systems of differential equations
When models are defined implicitly by systems of differential equations without a closed form solution, small local errors in finite-dimensional solution approximations can propagate into large deviations from the true underlying state trajectory. Inference for such models relies on a likelihood approximation constructed around a numerical solution, which underestimates posterior uncertainty. This talk will introduce and discuss progress in a new adaptive formalism for modeling and propagating discretization uncertainty through the Bayesian inferential framework, allowing exact inference and uncertainty quantification for discretized differential equation models.