Faculty

Tuesday, May 19, 2020

Charmaine Dean Named IMS Fellow

Charmaine Dean, Vice-President, Research and Professor in the Department of Statistics and Actuarial Science, University of Waterloo, has been named Fellow of the Institute of Mathematical Statistics (IMS).  Dr. Dean received the award for her scientifically important contributions to the analysis of count data, disease mapping, spatio-temporal data and more; for her outstanding leadership to the statistical profession, her record of mentorship and for her enormous work in keeping statistics visible at the center of science.

University of Waterloo Faculty to Mathematics researchers have developed a new method that enables large insurers to reduce the time spent estimating the financial liabilities of their portfolios from days to hours while achieving high accuracy.

A study details the new method which significantly reduces computational time, but still estimates the financial liability of variable annuity portfolios accurately for business purposes.

Statistics and Actuarial Science PhD candidate Yilin Chen is one of two students to claim the 2020 Huawei Prize for Best Research paper by a Mathematics Graduate Student. The $4,000 prize affirms the value of Chen’s efforts to establish a framework for analyzing nonprobability survey samples in her winning paper: Doubly Robust Interference with Nonprobability Survey Samples.

Friday, February 7, 2020 10:00 am - 10:00 am EST (GMT -05:00)

Department seminar by Gabriel Becker, University of California Davis

The Extended Reproducibility Phenotype - Re-framing and Generalizing Computational Reproducibility

Computational reproducibility has become a crucial part of how data analytic results are understood and assessed both in and outside of academia. Less work, however, has explored whether these strict computational reproducibility criteria are necessary or sufficient to actually meet our needs as consumers of analysis results. I will show that in principle they are neither. I will present two inter-related veins of work. First, I will provide a  conceptual reframing of the concepts of strict reproducibility, and the actions analysts take to ensure it, in terms of our ability to actually trust the results and the claims about the underlying data-generating systems they embody. Second, I will present a generalized conception of reproducibily by introducing the concepts of Currency, Comparability and Completeness and their oft-overlooked importance to assessing data analysis results.

Thursday, February 6, 2020 10:00 am - 10:00 am EST (GMT -05:00)

Department seminar by Liqun Diao, University of Waterloo

Censoring Unbiased Regression Trees and Ensembles

Tree-based methods are useful tools to identify risk groups and conduct prediction by employing recursive partitioning to separate subjects into different risk groups. We propose a novel paradigm of building regression trees for censored data in survival analysis. We prudently construct the censored-data loss function through an extension of the theory of censoring unbiased transformations. With the construction, we can conveniently implement the proposed regression trees algorithm using existing software for the Classification and Regression Trees algorithm (e.g., rpart package in R) and extend it for ensemble learning. Simulations and real data examples demonstrate that our methods either improve upon or remain competitive with existing tree-based algorithms for censored data.

Wednesday, February 5, 2020 10:00 am - 10:00 am EST (GMT -05:00)

Department seminar by David Kepplinger, University of British Columbia

Detecting the Signal Among Noise and Contamination in High Dimensions

Improvements in biomedical technology and a surge in other data-driven sciences lead to the collection of increasingly large amounts of data. In this affluence of data, contamination is ubiquitous but often neglected, creating substantial risk of spurious scientific discoveries. Especially in applications with high-dimensional data, for instance proteomic biomarker discovery, the impact of contamination on methods for variable selection and estimation can be profound yet difficult to diagnose.

In this talk I present a method for variable selection and estimation in high-dimensional linear regression models, leveraging the elastic-net penalty for complex data structures. The method is capable of harnessing the collected information even in the presence of arbitrary contamination in the response and the predictors. I showcase the method’s theoretical and practical advantages, specifically in applications with heavy-tailed errors and limited control over the data. I outline efficient algorithms to tackle computational challenges posed by inherently non-convex objective functions of robust estimators and practical strategies for hyper-parameter selection, ensuring scalability of the method and applicability to a wide range of problems.