Faculty

Thursday, January 30, 2020 10:00 am - 10:00 am EST (GMT -05:00)

Department seminar by Hyukjun (Jay) Gweon, Western University

Batch-mode active learning for regression and its application to the valuation of large variable annuity portfolios

Supervised learning algorithms require a sufficient amount of labeled data to construct an accurate predictive model. In practice, collecting labeled data may be extremely time-consuming while unlabeled data can be easily accessed. In a situation where labeled data are insufficient for a prediction model to perform well and the budget for an additional data collection is limited, it is important to effectively select objects to be labeled based on whether they contribute to a great improvement in the model's performance. In this talk, I will focus on the idea of active learning that aims to train an accurate prediction model with minimum labeling cost. In particular, I will present batch-mode active learning for regression problems. Based on random forest, I will propose two effective random sampling algorithms that consider the prediction ambiguities and diversities of unlabeled objects as measures of their informativeness. Empirical results on an insurance data set demonstrate the effectiveness of the proposed approaches in valuing large variable annuity portfolios (which is a practical problem in the actuarial field). Additionally, comparisons with the existing framework that relies on a sequential combination of unsupervised and supervised learning algorithms are also investigated.

Wednesday, January 22, 2020 10:00 am - 10:00 am EST (GMT -05:00)

Department seminar by Lin Liu, Harvard University

The possibility of nearly assumption-free inference in causal inference

In causal effect estimation, the state-of-the-art is the so-called double machine learning (DML) estimators, which combine the benefit of doubly robust estimation, sample splitting and using machine learning methods to estimate nuisance parameters. The validity of the confidence interval associated with a DML estimator, in most part, relies on the complexity of nuisance parameters and how close the machine learning estimators are to the nuisance parameters. Before we have a complete understanding of the theory of many machine learning methods including deep neural networks, even a DML estimator may have a bias so large that prohibits valid inference. In this talk, we describe a nearly assumption-free procedure that can either criticize the invalidity of the Wald confidence interval associated with the DML estimators of some causal effect of interest or falsify the certificates (i.e. the mathematical conditions) that, if true, could ensure valid inference. Essentially, we are testing the null hypothesis that if the bias of an estimator is smaller than a fraction $\rho$ its standard error. Our test is valid under the null without requiring any complexity (smoothness or sparsity) assumptions on the nuisance parameters or the properties of machine learning estimators and may have power to inform the analysts that they have to do something else than DML estimators or Wald confidence intervals for inference purposes. This talk is based on joint work with Rajarshi Mukherjee and James M. Robins.

Tuesday, January 21, 2020 10:00 am - 10:00 am EST (GMT -05:00)

Department seminar by Lu Yang, University of Amsterdam

Diagnostics for Regression Models with Discrete Outcomes

Making informed decisions about model adequacy has been an outstanding issue for regression models with discrete outcomes. Standard residuals such as Pearson and deviance residuals for such outcomes often show a large discrepancy from the hypothesized pattern even under the true model and are not informative especially when data are highly discrete. To fill this gap, we propose a surrogate empirical residual distribution function for general discrete (e.g. ordinal and count) outcomes that serves as an alternative to the empirical Cox-Snell residual distribution function. When at least one continuous covariate is available, we show asymptotically that the proposed function converges uniformly to the identity function under the correctly specified model, even with highly discrete (e.g. binary) outcomes. Through simulation studies, we demonstrate empirically that the proposed surrogate empirical residual distribution function is highly effective for various diagnostic tasks, since it is close to the hypothesized pattern under the true model and significantly departs from this pattern under model misspecification.

Monday, January 20, 2020 10:00 am - 10:00 am EST (GMT -05:00)

Department seminar by Jared Huling, Ohio State University

Sufficient Dimension Reduction for Populations with Structured Heterogeneity

Risk modeling has become a crucial component in the effective delivery of health care. A key challenge in building effective risk models is accounting for patient heterogeneity among the diverse populations present in health systems. Incorporating heterogeneity based on the presence of various comorbidities into risk models is crucial for the development of tailored care strategies, as it can provide patient-centered information and can result in more accurate risk prediction. Yet, in the presence of high dimensional covariates, accounting for this type of heterogeneity can exacerbate estimation difficulties even with large sample sizes. Towards this aim, we propose a flexible and interpretable risk modeling approach based on semiparametric sufficient dimension reduction. The approach accounts for patient heterogeneity, borrows strength in estimation across related subpopulations to improve both estimation efficiency and interpretability, and can serve as a useful exploratory tool or as a powerful predictive model. In simulated examples, we show that our approach can improve estimation performance in the presence of heterogeneity and is quite robust to deviations from its key underlying assumption. We demonstrate the utility of our approach in the prediction of hospital admission risk for a large health system when tested on further follow-up data.

Four graduate students were awarded a departmental research presentation award by the Department of Statistics and Actuarial Science, but that's not all they have in common. They all came to Waterloo because they knew of the excellence of the Statistics programs, research, and professors. Their backgrounds vary, as do their research areas, but they have all had a great experience.

Statistics and Actuarial Science PhD candidate Rui Qiao was one of the six students who won the 2019 Huawei Prize for Best Research Paper by a Mathematics Graduate Student. This award recognizes the impact of his Deep learning enables de novo peptide sequencing from data-independent-acquisition mass spectrometry with a prize of $4,000.

A Machine Learning Approach to Portfolio Risk Management


Risk measurement, valuation and hedging form an integral task in portfolio risk management for insurance companies and other financial institutions. Portfolio risk arises because the values of constituent assets and liabilities change over time in response to changes in the underlying risk factors. The quantification of this risk requires modeling the dynamic portfolio value process. This boils down to compute conditional expectations of future cash flows over long time horizons, e.g., up to 40 years and beyond, which is computationally challenging. 

This lecture presents a framework for dynamic portfolio risk management in discrete time building on machine learning theory. We learn the replicating martingale of the portfolio from a finite sample of its terminal cumulative cash flow. The learned replicating martingale is in closed form thanks to a suitable choice of the reproducing kernel Hilbert space. We develop an asymptotic theory and prove
convergence and a central limit theorem. We also derive finite sample error bounds and concentration inequalities. As application we compute the value at risk and expected shortfall of the one-year loss of some stylized portfolios.

Companies that fail to curb their carbon output may eventually face the consequences of asset devaluation and stock price depreciation, according to a new study out of the University of Waterloo.

The researchers further determined that the failure of companies within the emission-intensive sector to take carbon reduction actions could start negatively impacting the general stock market in as little as 10 years’ time.