Department of Statistics and
Actuarial Science (SAS)
Mathematics 3 (M3)
University of Waterloo
Administrative Staff Directory
Phone: 519-888-4567, ext. 33550
In 2019, the Master of Actuarial Science (MActSc) professional degree program will be celebrating 10 wonderful years at the University of Waterloo.
This workshop provides a crash course on using statistical methods and software when conducting data analysis in survey research. There will be a hands-on opportunity to conduct basic data analysis using SAS software. This workshop is presented by Dr. Christian Boudreau, Co-director of the Survey Research Centre (SRC), along with Grace Li from the International Tobacco Control (ITC) Project.
SURPLUS-INVARIANT RISK MEASURES ON ROBUST MODEL SPACES
In this talk, we present a systematic study of the notion of surplus invariance. In essence, the property of surplus invariance stipulates that whether or not a ﬁnancial institution is adequately capitalized from a regulatory perspective should not depend the surplus proﬁle of the company but only on its default proﬁle. Besides providing a unifying perspective on the existing literature, we establish a variety of new results including dual representations and extensions of surplus-invariant risk measures and structural results for surplus-invariant acceptance sets. The power of our results is demonstrated in model spaces with a dominating probability, including Orlicz spaces, as well as in robust model spaces where a dominating probability does not exist.
The main goal of precision medicine is to use patient characteristics to inform a personalized treatment plan as a sequence of decision rules that leads to the best possible health outcome for each patient. Q-learning is a reinforcement learning algorithm that is widely used to estimate an optimal dynamic treatment regime using both multi-stage randomized clinical trials and observational studies. Starting with the final study stage, Q-learning finds the treatment option that optimizes the desired expected outcome. Fixing the optimally-chosen treatment at the last stage, Q-learning moves backward to the immediately preceding stage and searches for a treatment option assuming that future treatments will be optimized. The process continues until the first stage is reached. Q-learning requires specifying a sequence of regression models and the validity of the concluding results relies on assuming that the models are correctly specified. Specifically, due to the nature of backward induction, the subsequent models are likely to be a complex function of covariates which may result in non-ignorable residual confounding under model misspecification. We propose a robust Q-learning method that leverages flexible machine learning techniques to reduce the chance of model misspecification, thereby while maintaining the efficiency of Q-learning, mitigating the main drawback of this method. We derive the asymptotic properties of our method and show that, under certain conditions, it will result in asymptotically linear estimators with certain influence functions.
Optimisation under Uncertainty
Numerical solutions to optimisation problems are of large interest for various applications. Data scarcity, measurement errors and model uncertainty are clear examples where numerical optimisations are made under uncertainty. Robust optimisation is an attempt to automatically find robust optimal solutions, but show that the degree of robustness of the actual procedure depends on the shape of the objective function when finding the optimal reinsurance contract. We first go through some examples that would explain the mechanics behind robust optimisation under uncertainty and we further extend our discussion to the classical data exploratory method, namely Multidimensional Scaling (MDS). We illustrate how the robust optimisation should be adapted in order to achieve more robust decisions.
A Machine Learning Approach to Portfolio Risk Management
Risk measurement, valuation and hedging form an integral task in portfolio risk management for insurance companies and other financial institutions. Portfolio risk arises because the values of constituent assets and liabilities change over time in response to changes in the underlying risk factors. The quantification of this risk requires modeling the dynamic portfolio value process. This boils down to compute conditional expectations of future cash flows over long time horizons, e.g., up to 40 years and beyond, which is computationally challenging.
This lecture presents a framework for dynamic portfolio risk management in discrete time building on machine learning theory. We learn the replicating martingale of the portfolio from a finite sample of its terminal cumulative cash flow. The learned replicating martingale is in closed form thanks to a suitable choice of the reproducing kernel Hilbert space. We develop an asymptotic theory and prove
convergence and a central limit theorem. We also derive finite sample error bounds and concentration inequalities. As application we compute the value at risk and expected shortfall of the one-year loss of some stylized portfolios.
Regional-level genetic association testing under genomic partitioning adapted to local linkage disequilibrium
Motivated by characterizations of genomic architecture where multiple-variant analysis can uncover novel associations missed by single-variant analysis, we consider computationally efficient regression-based testing methods for regional genomic discovery, including genomic partitioning, that are feasible for genome-wide processing. To address the challenging question of how to specify appropriate regional units, we apply a novel haplotype block detection algorithm that uses interval graph modeling to cluster correlated variants and partition the genome into a large number of non-overlapping and quasi-independent linkage disequilibrium block regions. Within each block, we specify multiple-variant global test statistics with reduced dimension that maybe subject to multi-level testing. I will discuss some of the theoretical and practical issues we face in applications to quantitative trait and disease status analyses using dense genotyping/imputation genome-wide association study data.
Convergence to the Mean Field Game Limit: A Case Study
Conditional Optimal Stopping: A Time-Inconsistent Optimization
Estimation methods to address correlated errors in time-to-event outcomes and exposures
Electronic health records (EHR) data are increasingly used in medical research, but EHR data, which typically are not collected to support research, are often subject to measurement error. These errors, if not addressed, can bias results in association analyses. Methodology to address covariate measurement error has been well developed; however, methods to address errors in time-to-event outcomes are relatively underdeveloped. We will consider methods to address errors in both the covariate and time-to-event outcome that are potentially correlated. We develop an extension to the popular regression calibration method for this setting. Regression calibration has been shown to perform well for settings with covariate measurement error (Prentice, 1982; Shaw and Prentice, 2012), but it is known that this method is generally biased for nonlinear regression models, such as the Cox model for time-to-event outcomes. Thus, we additionally propose raking estimators, which will be unbiased when an unbiased estimating equation is available on a validation subset. Raking is a standard method in survey sampling that makes use of auxiliary information on the population to improve upon the simple Horvitz-Thompson estimator applied to a subset of data (e.g. the validation subset). We demonstrate through numerical studies that raking can improve upon the regression calibration estimators in certain settings with failure-time data. We will discuss the choice of the auxiliary variable and aspects of the underlying estimation problem that affect the degree of improvement that the raking estimator will have over the simpler, biased regression calibration approach. Detailed simulation studies are presented to examine the relative performance of the proposed estimators under varying levels of signal, covariance, and censoring. We further illustrate the methods with an analysis of observational EHR data on HIV outcomes from the Vanderbilt Comprehensive Care Clinic.
Causal Inference for Complex Observational Data
Observational data often have issues which present challenges for the data analyst. The treatment status or exposure of interest is often not assigned randomly. Data are sometimes missing not at random (MNAR) which can lead to sample selection bias. And many statistical models for these data must account for unobserved confounding. This talk will demonstrate how to use standard maximum likelihood estimation to fit extended regression models (ERMs) that deal with all of these common issues alone or simultaneously.
Background risk model and inference based on ranks of residuals
It is often easier to model the behaviour of a random vector by choosing the marginal distributions and the copula separately rather than using a classical multivariate distribution. Many copula families, including the classes of Archimedean and elliptical copulas, may be written as the survival copula of a random vector R(X,Y), where R is a strictly positive random variable independent of the random vector (X,Y). A unified framework is presented for studying the dependence structure underlying this stochastic representation, which is called the background risk model. However, in many applications, part of the dependence may be explained by observable external factors, which justifies the use of generalized linear models for the marginal distributions. In this case and under some conditions that will be discussed, the inference on the copula can be based on the ranks of suitable residuals.
Excursion Probabilities and Geometric Properties of Multivariate Gaussian Random Fields
Excursion probabilities of Gaussian random fields have many applications in statistics (e.g., scanning statistic and control of false discovery rate (FDR)) and in other areas. The study of excursion probabilities has had a long history and is closely related to geometry of Gaussian random fields. In recent years, important developments have been made in both probability and statistics.
In this talk, we consider the excursion probabilities of bivariate Gaussian random fields with non-smooth (or fractal) sample functions and study their geometric properties and excursion probabilities. Important classes of multivariate Gaussian random fields are those stationary with Matérn cross-covariance functions [Gneiting, Kleiber, and Schlather (2010)] and operator fractional Brownian motions which are operator-self-similar with stationary increments.
Modeling Winning Streaks in Financial Markets & Sample Recycling Method for Nested Stochastics
A new class of stochastic processes, termed sticky extrema processes, is proposed to model common phenomena of winning and losing streaks in financial markets including equity, commodity, foreign exchange, etc. Most stochastic process models for financial market data in the current literature focus on stylized facts such as fail tailedness relative to normality, volatility clustering, mean reversion, etc. However, none of existing financial models captures a frequently observable “extrema clustering" feature that most financial indices often report record high or low in concentrated periods of time. The lack of “extrema clustering" feature in a stochastic model for asset valuation can have a grave impact on the pricing and risk management of path-dependent financial derivatives. Especially those with payoffs dependent on optimal (maximum or minimum) underlying market values can be severely misestimated.
Nested stochastic modeling has been on the rise in many fields of the financial industry. Nested stochastic models refer to stochastic models embedded inside other stochastic models. Examples can be found in principle-based reserving for long term insurance liabilities. Reserves and capitals for interest and market risk sensitive financial products are often determined by stochastic valuation. In the projection of cash flows, further simulations are necessary to evaluate risk management action, such as a hedging program, at each point of time. The computational demand grows exponentially with the layers of nested stochastic modeling and points of evaluation. Most of existing techniques to speed up nested simulation are based on curve fitting, which is to establish a functional relationship between inner loop estimator and economic scenarios and to replace inner loop simulations with the fitted curve. This work presents a non-conventional approach, termed sample recycling method, which is to run inner loop estimation for a small set of outer loop scenarios and find estimates under other outer loop scenarios by recycling inner loop paths. This new approach can be very efficient when curve fittings are difficult to achieve.
A Unified Approach to Sparse Tweedie Modeling of Multi-Source Insurance Claim Data
Actuarial practitioners now have access to multiple sources of insurance data corresponding to various situations: multiple business lines, umbrella coverage, multiple hazards, and so on. Despite the wide use and simple nature of single-target approaches, modeling these types of data may benefit from a simultaneous approach. We propose a unified algorithm to perform sparse learning of such fused insurance data under the Tweedie (compound Poisson) model. By integrating ideas from multi-task sparse learning and sparse Tweedie modeling, our algorithm produces flexible regularization that balances predictor sparsity and between-sources sparsity. When applied to simulated and real data, our approach clearly outperforms single-target modeling in both prediction and selection accuracy, notably when the sources do not have exactly the same set of predictors. An efficient implementation of the proposed algorithm is provided in our R package MStweedie.
Risk Aggregation: A General Approach via the Class of Generalized Gamma Convolutions
Risk aggregation is virtually everywhere in insurance applications. Indeed, in the vast majority of situations insurers are interested in the properties of the sums of the risks they are exposed to, rather than in the stand-alone risks per se. Unfortunately, the problem of formulating the probability distributions of the aforementioned sums is rather involved, and as a rule does not have an explicit solution. As a result, numerous methods to approximate the distributions of the sums have been proposed, with the moment matching approximations (MMAs) being arguably the most popular. The arsenal of the existing MMAs is quite impressive and contains such very simple methods as the normal and shifted-gamma approximations that, respectively, match the first two and three moments, only, as well as such much more intricate methods as the one based on the mixed Erlang distributions. Note however that in practice the sums of insurance risks can have numerous and just a few summands; in the latter case the normal
approximation is very questionable. Also, in practice the distributions of the stand-alone risks can be light-tailed or heavy-tailed; in the latter case moments of higher orders (e.g., second, etc.) may not exist, and so the approximation based on mixed Erlang distributions is of limited usefulness. In this talk I will reveal a refined MMA method for approximating the distributions of the sums of insurance risks. The method approximates the distributions of interest to any desired precision, works equally well for light and heavy-tailed distributions, and is reasonably fast irrespective of the number of the involved summands. (This is a joint work with Justin Miles and Alexey Kuznetsov, York University.)
Non-standard problems in statistical inference:Bartlett identity, boundary, identifiability issues
In this talk, I will cover a few ideas in tackling non-standard problems in statistical inference, including Bartlett identity, boundary and identifiability issues. I will show that these considerations are critical in model robustness, statistical power, and validity. I will also present implications of these ideas in addressing key challenges in biomedical research using massive healthcare data, in particular, electronic health records, drug/vaccine safety surveillance data. Case studies using University of Pennsylvania Biobank data will be provided.
Equilibrium recoveries in insurance markets with limited liability
In this talk, I will talk about optimal insurance in partial equilibrium in case the insurer is protected by limited liability, and the multivariate insured risk is exchangeable. I focus on the optimal allocation of remaining assets in default, and show existence of an equilibrium in the market. In such an equilibrium, perfect pooling of the risk in the market occurs, but a protection fund is needed to charge levies to policyholders with low realized losses. If policyholders cannot be forced ex post to pay a levy, the constrained equal loss rule is used in equilibrium. This rule gained particular interest in the literature on bankruptcy problems. Moreover, in absence of a regulator, the insurer will always invest all its assets in the risky technology. The welfare losses if other recovery rules are used in case of default are illustrated; a different recovery rule can substantially effect the profit of the insurer. This talk will be based on a working paper on SSRN.
Propensity scores and missing data, with application to research on effects of prenatal alcohol exposure
We discuss challenges in analysing data from several epidemiological cohort studies designed to explore the association between prenatal alcohol exposure and child development. After Professors Sandra and Joseph Jacobson briefly describe the broader context, Professor Ryan will give a brief overview of the statistical challenges. Dr. Akkaya will then go into more detail, describing the approaches we have been taking, using propensity score analysis to adjust for potential confounders. She will briefly review propensity score methods and their extension to application with continuous predictors, such as the amount of alcohol consumed by the mother during pregnancy. She will then discuss extensions we have developed, using multiple imputation, to handle missing and misspecified covariates in this context. In particular, she will describe a strategy for incorporating variables that have a two-part or semi-continuous structure. This arises in our setting, for example, since many women will have zero exposure (meaning they are non-drinkers or abstainers), while there will then be a wide and long-tailed distribution of exposure levels among those who drink. Two-part or semi-continuous variables also arise among the potential confounding variables in our study, for example, use of cocaine and other drugs during pregnancy.
Space-filling Designs for Computer Experiments and Their Application to Big Data Research
Computer experiments provide useful tools for investigating complex systems, and they call for space-ﬁlling designs, which are a class of designs that allow the use of various modeling methods. He and Tang (2013) introduced and studied a class of space-ﬁlling designs, strong orthogonal arrays. To date, an important problem that has not been addressed in the literature is that of design selection for such arrays. In this talk, I will ﬁrst give a broad introduction to space-ﬁlling designs, and then present some results on the selection of strong orthogonal arrays.
The second part of my talk will present some preliminary work on the application of space-ﬁlling designs to big data research. Nowadays, it is challenging to use current computing resources to analyze super-large datasets. Subsampling-based methods are the common approaches to reducing data sizes, with the leveraging method (Ma and Sun, 2014) being the most popular. Recently, a new approach, information-based optimal subdata selection (IBOSS) method was proposed (Wang, Yang and Stufken, 2018), which applies the design methodology to the big data problem. However, both the leveraging method and the IBOSS method are model-dependent. Space-ﬁlling designs do not suﬀer this drawback, as shown in our simulation studies.
From Random Landscapes to Statistical inference
Consider the problem of recovering a rank 1 tensor of order k that has been subject to additive Gaussian Noise. It is information theoretically possible to recover the tensor with a finite number of samples via maximum likelihood estimation, however, it is expected that one needs a polynomially diverging number of samples to efficiently recover it. What is the cause if this large statistical-to-algorithmic gap? To understand this interesting question of high dimensional statistics, we begin by studying an intimately related question: optimization of random homogenous polynomials on the sphere in high dimensions. We show that the estimation threshold is related to a geometric analogue of the BBP transition for matrices. We then study the threshold for efficient recovery for a simple class of algorithms, Langevin dynamics and gradient descent. We view this problem in terms of a broader class of polynomial optimization problems and propose a mechanism or success/failure of recovery in terms of the strength of the signal on the high entropy region of the initialization. We will review several results including joint works with Ben Arous-Gheissari and Lopatto-Miolane.
How does consumption habit affect the household’s demand for life-contingent claims?
This paper examines the impact of habit formation on demand for life-contingent claims. We propose a life-cycle model with habit formation and solve the optimal consumption, portfolio choice, and life insurance/annuity problem analytically. We illustrate how consumption habits can alter the bequest motive and therefore drive the demand for life-contingent products. Finally, we use our model to examine the mismatch in the life insurance market between the life insurance holdings of most households and their underlying financial vulnerabilities, and the mismatch in the annuity market between the lack of any annuitization and the risk of outliving financial wealth.
If Journals Embraced Conditional Equivalence Testing, Would Research be Better?
Motivated by recent concerns with the reproducibility and reliability of scientific research, we introduce a publication policy that incorporates "conditional equivalence testing" (CET), a two-stage testing scheme in which standard null hypothesis significance testing (NHST) is followed conditionally by testing for equivalence. We explain how such a policy could address issues of publication bias, and investigate similarities with a Bayesian approach. We then develop a novel optimality model that, given current incentives to publish, predicts a researcher's most rational use of resources. Using this model, we are able to determine whether a given policy, such as our CET policy, can incentivize more reliable and reproducible research.
Department of Statistics and
Actuarial Science (SAS)
Mathematics 3 (M3)
University of Waterloo
Administrative Staff Directory
Phone: 519-888-4567, ext. 33550