New developments in survival forests techniques
Survival analysis answers the question of when an event of interest will happen. It studies time-to-event data where the true time is only observed for some subjects and others are censored. Right-censoring is the most common form of censoring in survival data. Tree-based methods are versatile and useful tools for analyzing survival data with right-censoring. Survival forests, that are ensembles of trees for time-to-event data, are powerful methods and are popular among practitioners. Current implementations of survival forests have some limitations. First, most of them use the log-rank test as the splitting rule which loses power when the proportional hazards assumption is violated. Second, they work under the assumption that the event time and the censoring time are independent, given the covariates. Third, they do not provide dynamic predictions in presence of time-varying covariates. We propose solutions to these limitations: We suggest the use of the integrated absolute difference between the two children nodes survival functions as the splitting rule for settings where the proportionality assumption is violated. We propose two approaches to tackle the problem of dependent censoring with random forests. The first approach is to use a final estimate of the survival function that corrects for dependent censoring. The second one is to use a splitting rule which does not rely on the independent censoring assumption. Lastly, we make recommendations for different ways to obtain dynamic estimations of the hazard function with random forests with discrete-time survival data in presence of time-varying covariates. In our current work, we are developing forest for clustered survival data.