Wednesday, February 28, 2018 — 10:30 AM EST

**New developments in survival forests techniques**

Survival analysis answers the question of when an event of interest will happen. It studies time-to-event data where the true time is only observed for some subjects and others are censored. Right-censoring is the most common form of censoring in survival data. Tree-based methods are versatile and useful tools for analyzing survival data with right-censoring. Survival forests, that are ensembles of trees for time-to-event data, are powerful methods and are popular among practitioners. Current implementations of survival forests have some limitations. First, most of them use the log-rank test as the splitting rule which loses power when the proportional hazards assumption is violated. Second, they work under the assumption that the event time and the censoring time are independent, given the covariates. Third, they do not provide dynamic predictions in presence of time-varying covariates. We propose solutions to these limitations: We suggest the use of the integrated absolute difference between the two children nodes survival functions as the splitting rule for settings where the proportionality assumption is violated. We propose two approaches to tackle the problem of dependent censoring with random forests. The first approach is to use a final estimate of the survival function that corrects for dependent censoring. The second one is to use a splitting rule which does not rely on the independent censoring assumption. Lastly, we make recommendations for different ways to obtain dynamic estimations of the hazard function with random forests with discrete-time survival data in presence of time-varying covariates. In our current work, we are developing forest for clustered survival data.

Monday, February 12, 2018 — 10:30 AM EST

**Statistics meets the protein folding problem: fast exploration of conformations with sequential Monte Carlo**

The problem of predicting the 3-D structure of a protein from its amino acid sequence using computer algorithms has challenged scientists for nearly a half century. The structure of a protein is essential for understanding its function, and hence accurate structure prediction is of vital importance in modern applications such as protein design in biomedicine. A powerful approach for structure prediction is to search for the conformation of the protein that has minimum potential energy. However due to the size of the conformational space, efficient exploration remains a bottleneck for energy-guided computational methods even with the aid of known structures in the Protein Data Bank. In this talk, I will first introduce this exploration problem from a statistical perspective. Then, I will present a new method for building segments of protein structures that is inspired by sequential Monte Carlo and enables faster exploration than existing methods. Finally, we apply the method to examples of real proteins and demonstrate its promise for improving the low confidence segments of 3-D structure predictions.