Monday, June 24, 2019 — 4:00 PM EDT

#### A regularization approach to the dynamic panel data model estimation

In a dynamic panel data model, the number of moment conditions may be very large even if the time dimension is moderately large. Even though the use of many moment conditions improves the asymptotic efficiency, the inclusion of an excessive number of moment conditions increases the bias in finite samples. An immediate consequence of a large number of instruments is a large dimensional covariance matrix of the instruments. As a consequence, the condition number (the largest eingenvalue divided by the smallest one) is very high especially when the autoregressive parameter is close to unity. Inverting covariance matrix of instruments with high condition number can badly impact the properties of the estimators. This paper proposes a regularization approach to the estimation of such models using three regularization schemes based on three different ways of inverting the covariance matrix of the instruments. Under double asymptotic, we show that our regularized estimators are consistent and asymptotically normal. These regularization schemes involve a regularization or smoothing parameter so that we derive a data driven selection of this regularization parameter based on an approximation of the Mean Square Error and show its optimality. The simulations confirm that regularization improves the properties of the usual GMM estimator. As empirical application, we investigate the effect of financial development on economic growth. Regularization corrects the bias of the usual GMM estimator which seems to underestimate the financial development - economic growth effect.

Friday, June 14, 2019 — 10:30 AM EDT

**Aggregate Risk and Bank Regulation in General Equilibrium**

We examine the optimal design of bank regulation in a general equilibrium model. The unregulated economy has multiple equilibria that feature varying sizes of the financial sector and bank fragility. The economy underinvests (overinvests) in risky production when aggregate risk is low (high). We characterize and implement the efficient allocations via capital and reserve requirements, deposit insurance and bailouts. There is a range of efficient regulatory policies with a stricter capital requirement on banks being accompanied by a looser reserve requirement and less deposit insurance. We derive novel insights into how aggregate risk influences capital and reserve requirements as well as the efficiency of depositor subsidies.

Friday, May 31, 2019 — 6:00 PM EDT

In 2019, the Master of Actuarial Science (MActSc) professional degree program will be celebrating 10 wonderful years at the University of Waterloo.

Friday, May 31, 2019 — 10:30 AM EDT

**Worst-case risk measures and distributionally robust optimization**

Distributional ambiguity refers to the situation where the probability distribution of uncertain outcomes is unknown. The question of how to account for distributional ambiguity has been of central interest in risk management, and more generally, many fields involving decision making under uncertainty. In this talk, we present a general framework of risk minimization based on distortion risk measures (also known as dual utility) and show how the worst-case risk can be evaluated when only the support and moments are known for the underlying distribution. We also show that the problem of minimizing the worst-case risk, also known as distributionally robust optimization (DRO) problem, can be solved efficiently in large scale for a large class of decision problems including portfolio optimization, production and transportation planning, among many others. Worst-case distributions, i.e. distributions attaining the worst-case risk, are characterized, which offer useful intuition about the worst-case scenarios.

Wednesday, May 15, 2019 — 9:30 AM to Thursday, May 16, 2019 — 3:30 PM EDT

This workshop provides a crash course on using statistical methods and software when conducting data analysis in survey research. There will be a hands-on opportunity to conduct basic data analysis using SAS software. This workshop is presented by Dr. Christian Boudreau, Co-director of the Survey Research Centre (SRC), along with Grace Li from the International Tobacco Control (ITC) Project.

Friday, May 10, 2019 — 10:30 AM EDT

**SURPLUS-INVARIANT RISK MEASURES ON ROBUST MODEL SPACES**

In this talk, we present a systematic study of the notion of surplus invariance. In essence, the property of surplus invariance stipulates that whether or not a ﬁnancial institution is adequately capitalized from a regulatory perspective should not depend the surplus proﬁle of the company but only on its default proﬁle. Besides providing a unifying perspective on the existing literature, we establish a variety of new results including dual representations and extensions of surplus-invariant risk measures and structural results for surplus-invariant acceptance sets. The power of our results is demonstrated in model spaces with a dominating probability, including Orlicz spaces, as well as in robust model spaces where a dominating probability does not exist.

Thursday, May 9, 2019 — 4:00 PM EDT

** Robust Q-learning**

The main goal of precision medicine is to use patient characteristics to inform a personalized treatment plan as a sequence of decision rules that leads to the best possible health outcome for each patient. Q-learning is a reinforcement learning algorithm that is widely used to estimate an optimal dynamic treatment regime using both multi-stage randomized clinical trials and observational studies. Starting with the final study stage, Q-learning finds the treatment option that optimizes the desired expected outcome. Fixing the optimally-chosen treatment at the last stage, Q-learning moves backward to the immediately preceding stage and searches for a treatment option assuming that future treatments will be optimized. The process continues until the first stage is reached. Q-learning requires specifying a sequence of regression models and the validity of the concluding results relies on assuming that the models are correctly specified. Specifically, due to the nature of backward induction, the subsequent models are likely to be a complex function of covariates which may result in non-ignorable residual confounding under model misspecification. We propose a robust Q-learning method that leverages flexible machine learning techniques to reduce the chance of model misspecification, thereby while maintaining the efficiency of Q-learning, mitigating the main drawback of this method. We derive the asymptotic properties of our method and show that, under certain conditions, it will result in asymptotically linear estimators with certain influence functions.

Friday, May 3, 2019 — 3:00 PM to Sunday, May 5, 2019 — 6:00 PM EDT

Friday, May 3, 2019 — 10:30 AM EDT

**Optimisation under Uncertainty**

Numerical solutions to optimisation problems are of large interest for various applications. Data scarcity, measurement errors and model uncertainty are clear examples where numerical optimisations are made under uncertainty. Robust optimisation is an attempt to automatically find robust optimal solutions, but show that the degree of robustness of the actual procedure depends on the shape of the objective function when finding the optimal reinsurance contract. We first go through some examples that would explain the mechanics behind robust optimisation under uncertainty and we further extend our discussion to the classical data exploratory method, namely Multidimensional Scaling (MDS). We illustrate how the robust optimisation should be adapted in order to achieve more robust decisions.

Thursday, April 25, 2019 — 4:00 PM EDT

**A Machine Learning Approach to Portfolio Risk Management**

Risk measurement, valuation and hedging form an integral task in portfolio risk management for insurance companies and other financial institutions. Portfolio risk arises because the values of constituent assets and liabilities change over time in response to changes in the underlying risk factors. The quantification of this risk requires modeling the dynamic portfolio value process. This boils down to compute conditional expectations of future cash flows over long time horizons, e.g., up to 40 years and beyond, which is computationally challenging.

This lecture presents a framework for dynamic portfolio risk management in discrete time building on machine learning theory. We learn the replicating martingale of the portfolio from a finite sample of its terminal cumulative cash flow. The learned replicating martingale is in closed form thanks to a suitable choice of the reproducing kernel Hilbert space. We develop an asymptotic theory and prove

convergence and a central limit theorem. We also derive finite sample error bounds and concentration inequalities. As application we compute the value at risk and expected shortfall of the one-year loss of some stylized portfolios.

Thursday, April 25, 2019 — 9:00 AM to Friday, April 26, 2019 — 5:00 PM EDT

### The first Waterloo Conference in Statistics, Actuarial Science, and Finance (WATSAF^{1}).

Thursday, April 18, 2019 — 2:30 PM EDT

**Regional-level genetic association testing under genomic partitioning adapted to local linkage disequilibrium**** **

Motivated by characterizations of genomic architecture where multiple-variant analysis can uncover novel associations missed by single-variant analysis, we consider computationally efficient regression-based testing methods for regional genomic discovery, including genomic partitioning, that are feasible for genome-wide processing. To address the challenging question of how to specify appropriate regional units, we apply a novel haplotype block detection algorithm that uses interval graph modeling to cluster correlated variants and partition the genome into a large number of non-overlapping and quasi-independent linkage disequilibrium block regions. Within each block, we specify multiple-variant global test statistics with reduced dimension that maybe subject to multi-level testing. I will discuss some of the theoretical and practical issues we face in applications to quantitative trait and disease status analyses using dense genotyping/imputation genome-wide association study data.

Tuesday, April 9, 2019 — 4:00 AM EDT

**Convergence to the Mean Field Game Limit: A Case Study**

Friday, April 5, 2019 — 10:30 AM EDT

**Conditional Optimal Stopping: A Time-Inconsistent Optimization**

Thursday, April 4, 2019 — 4:00 PM EDT

**Estimation methods to address correlated errors in time-to-event outcomes and exposures**

Electronic health records (EHR) data are increasingly used in medical research, but EHR data, which typically are not collected to support research, are often subject to measurement error. These errors, if not addressed, can bias results in association analyses. Methodology to address covariate measurement error has been well developed; however, methods to address errors in time-to-event outcomes are relatively underdeveloped. We will consider methods to address errors in both the covariate and time-to-event outcome that are potentially correlated. We develop an extension to the popular regression calibration method for this setting. Regression calibration has been shown to perform well for settings with covariate measurement error (Prentice, 1982; Shaw and Prentice, 2012), but it is known that this method is generally biased for nonlinear regression models, such as the Cox model for time-to-event outcomes. Thus, we additionally propose raking estimators, which will be unbiased when an unbiased estimating equation is available on a validation subset. Raking is a standard method in survey sampling that makes use of auxiliary information on the population to improve upon the simple Horvitz-Thompson estimator applied to a subset of data (e.g. the validation subset). We demonstrate through numerical studies that raking can improve upon the regression calibration estimators in certain settings with failure-time data. We will discuss the choice of the auxiliary variable and aspects of the underlying estimation problem that affect the degree of improvement that the raking estimator will have over the simpler, biased regression calibration approach. Detailed simulation studies are presented to examine the relative performance of the proposed estimators under varying levels of signal, covariance, and censoring. We further illustrate the methods with an analysis of observational EHR data on HIV outcomes from the Vanderbilt Comprehensive Care Clinic.

Wednesday, April 3, 2019 — 4:00 PM EDT

**Causal Inference for Complex Observational Data**

Observational data often have issues which present challenges for the data analyst. The treatment status or exposure of interest is often not assigned randomly. Data are sometimes missing not at random (MNAR) which can lead to sample selection bias. And many statistical models for these data must account for unobserved confounding. This talk will demonstrate how to use standard maximum likelihood estimation to fit extended regression models (ERMs) that deal with all of these common issues alone or simultaneously.

Friday, March 29, 2019 — 10:30 AM EDT

**Background risk model and inference based on ranks of residuals**

It is often easier to model the behaviour of a random vector by choosing the marginal distributions and the copula separately rather than using a classical multivariate distribution. Many copula families, including the classes of Archimedean and elliptical copulas, may be written as the survival copula of a random vector R(X,Y), where R is a strictly positive random variable independent of the random vector (X,Y). A unified framework is presented for studying the dependence structure underlying this stochastic representation, which is called the background risk model. However, in many applications, part of the dependence may be explained by observable external factors, which justifies the use of generalized linear models for the marginal distributions. In this case and under some conditions that will be discussed, the inference on the copula can be based on the ranks of suitable residuals.

Thursday, March 28, 2019 — 4:00 PM EDT

**Excursion Probabilities and Geometric Properties of Multivariate Gaussian Random Fields**

Excursion probabilities of Gaussian random fields have many applications in statistics (e.g., scanning statistic and control of false discovery rate (FDR)) and in other areas. The study of excursion probabilities has had a long history and is closely related to geometry of Gaussian random fields. In recent years, important developments have been made in both probability and statistics.

In this talk, we consider the excursion probabilities of bivariate Gaussian random fields with non-smooth (or fractal) sample functions and study their geometric properties and excursion probabilities. Important classes of multivariate Gaussian random fields are those stationary with Matérn cross-covariance functions [Gneiting, Kleiber, and Schlather (2010)] and operator fractional Brownian motions which are operator-self-similar with stationary increments.

Friday, March 15, 2019 — 10:30 AM EDT

**Modeling Winning Streaks in Financial Markets & Sample Recycling Method for Nested Stochastics**

Topic #1:

A new class of stochastic processes, termed sticky extrema processes, is proposed to model common phenomena of winning and losing streaks in financial markets including equity, commodity, foreign exchange, etc. Most stochastic process models for financial market data in the current literature focus on stylized facts such as fail tailedness relative to normality, volatility clustering, mean reversion, etc. However, none of existing financial models captures a frequently observable “extrema clustering" feature that most financial indices often report record high or low in concentrated periods of time. The lack of “extrema clustering" feature in a stochastic model for asset valuation can have a grave impact on the pricing and risk management of path-dependent financial derivatives. Especially those with payoffs dependent on optimal (maximum or minimum) underlying market values can be severely misestimated.

Topic #2:

Nested stochastic modeling has been on the rise in many fields of the financial industry. Nested stochastic models refer to stochastic models embedded inside other stochastic models. Examples can be found in principle-based reserving for long term insurance liabilities. Reserves and capitals for interest and market risk sensitive financial products are often determined by stochastic valuation. In the projection of cash flows, further simulations are necessary to evaluate risk management action, such as a hedging program, at each point of time. The computational demand grows exponentially with the layers of nested stochastic modeling and points of evaluation. Most of existing techniques to speed up nested simulation are based on curve fitting, which is to establish a functional relationship between inner loop estimator and economic scenarios and to replace inner loop simulations with the fitted curve. This work presents a non-conventional approach, termed sample recycling method, which is to run inner loop estimation for a small set of outer loop scenarios and find estimates under other outer loop scenarios by recycling inner loop paths. This new approach can be very efficient when curve fittings are difficult to achieve.

Thursday, March 14, 2019 — 4:00 PM EDT

**A Unified Approach to Sparse Tweedie Modeling of Multi-Source Insurance Claim Data**

Actuarial practitioners now have access to multiple sources of insurance data corresponding to various situations: multiple business lines, umbrella coverage, multiple hazards, and so on. Despite the wide use and simple nature of single-target approaches, modeling these types of data may benefit from a simultaneous approach. We propose a unified algorithm to perform sparse learning of such fused insurance data under the Tweedie (compound Poisson) model. By integrating ideas from multi-task sparse learning and sparse Tweedie modeling, our algorithm produces flexible regularization that balances predictor sparsity and between-sources sparsity. When applied to simulated and real data, our approach clearly outperforms single-target modeling in both prediction and selection accuracy, notably when the sources do not have exactly the same set of predictors. An efficient implementation of the proposed algorithm is provided in our R package MStweedie.

Friday, March 8, 2019 — 10:30 AM EST

**Risk Aggregation: A General Approach via the Class of Generalized Gamma Convolutions**

Risk aggregation is virtually everywhere in insurance applications. Indeed, in the vast majority of situations insurers are interested in the properties of the sums of the risks they are exposed to, rather than in the stand-alone risks per se. Unfortunately, the problem of formulating the probability distributions of the aforementioned sums is rather involved, and as a rule does not have an explicit solution. As a result, numerous methods to approximate the distributions of the sums have been proposed, with the moment matching approximations (MMAs) being arguably the most popular. The arsenal of the existing MMAs is quite impressive and contains such very simple methods as the normal and shifted-gamma approximations that, respectively, match the first two and three moments, only, as well as such much more intricate methods as the one based on the mixed Erlang distributions. Note however that in practice the sums of insurance risks can have numerous and just a few summands; in the latter case the normal

approximation is very questionable. Also, in practice the distributions of the stand-alone risks can be light-tailed or heavy-tailed; in the latter case moments of higher orders (e.g., second, etc.) may not exist, and so the approximation based on mixed Erlang distributions is of limited usefulness. In this talk I will reveal a refined MMA method for approximating the distributions of the sums of insurance risks. The method approximates the distributions of interest to any desired precision, works equally well for light and heavy-tailed distributions, and is reasonably fast irrespective of the number of the involved summands. (This is a joint work with Justin Miles and Alexey Kuznetsov, York University.)

Thursday, March 7, 2019 — 4:00 PM EST

**Non-standard problems in statistical inference:Bartlett identity, boundary, identifiability issues**

In this talk, I will cover a few ideas in tackling non-standard problems in statistical inference, including Bartlett identity, boundary and identifiability issues. I will show that these considerations are critical in model robustness, statistical power, and validity. I will also present implications of these ideas in addressing key challenges in biomedical research using massive healthcare data, in particular, electronic health records, drug/vaccine safety surveillance data. Case studies using University of Pennsylvania Biobank data will be provided.

Friday, February 22, 2019 — 10:30 AM EST

**Equilibrium recoveries in insurance markets with limited liability**

In this talk, I will talk about optimal insurance in partial equilibrium in case the insurer is protected by limited liability, and the multivariate insured risk is exchangeable. I focus on the optimal allocation of remaining assets in default, and show existence of an equilibrium in the market. In such an equilibrium, perfect pooling of the risk in the market occurs, but a protection fund is needed to charge levies to policyholders with low realized losses. If policyholders cannot be forced ex post to pay a levy, the constrained equal loss rule is used in equilibrium. This rule gained particular interest in the literature on bankruptcy problems. Moreover, in absence of a regulator, the insurer will always invest all its assets in the risky technology. The welfare losses if other recovery rules are used in case of default are illustrated; a different recovery rule can substantially effect the profit of the insurer. This talk will be based on a working paper on SSRN.

Thursday, February 14, 2019 — 1:00 PM EST

*** PLEASE NOTE: This seminar has been cancelled. **

**Propensity scores and missing data, with application to research on effects of prenatal alcohol exposure **

We discuss challenges in analysing data from several epidemiological cohort studies designed to explore the association between prenatal alcohol exposure and child development. After Professors Sandra and Joseph Jacobson briefly describe the broader context, Professor Ryan will give a brief overview of the statistical challenges. Dr. Akkaya will then go into more detail, describing the approaches we have been taking, using propensity score analysis to adjust for potential confounders. She will briefly review propensity score methods and their extension to application with continuous predictors, such as the amount of alcohol consumed by the mother during pregnancy. She will then discuss extensions we have developed, using multiple imputation, to handle missing and misspecified covariates in this context. In particular, she will describe a strategy for incorporating variables that have a two-part or semi-continuous structure. This arises in our setting, for example, since many women will have zero exposure (meaning they are non-drinkers or abstainers), while there will then be a wide and long-tailed distribution of exposure levels among those who drink. Two-part or semi-continuous variables also arise among the potential confounding variables in our study, for example, use of cocaine and other drugs during pregnancy.

Tuesday, February 5, 2019 — 4:00 PM EST

**Space-filling Designs for Computer Experiments and Their Application to Big Data Research**

Computer experiments provide useful tools for investigating complex systems, and they call for space-ﬁlling designs, which are a class of designs that allow the use of various modeling methods. He and Tang (2013) introduced and studied a class of space-ﬁlling designs, strong orthogonal arrays. To date, an important problem that has not been addressed in the literature is that of design selection for such arrays. In this talk, I will ﬁrst give a broad introduction to space-ﬁlling designs, and then present some results on the selection of strong orthogonal arrays.

The second part of my talk will present some preliminary work on the application of space-ﬁlling designs to big data research. Nowadays, it is challenging to use current computing resources to analyze super-large datasets. Subsampling-based methods are the common approaches to reducing data sizes, with the leveraging method (Ma and Sun, 2014) being the most popular. Recently, a new approach, information-based optimal subdata selection (IBOSS) method was proposed (Wang, Yang and Stufken, 2018), which applies the design methodology to the big data problem. However, both the leveraging method and the IBOSS method are model-dependent. Space-ﬁlling designs do not suﬀer this drawback, as shown in our simulation studies.