Statistical Science in Society abstracts

Insurance Risk, Ruin Analysis and Applications in Statistics

Hansjoerg Albrecher
Université de Lausanne, Switzerland

Wald's classical sequential probability ratio test for one simple hypothesis against an alternative is based on the boundary crossing of an associated random walk. In this talk we connect this test to a problem in ruin theory, and use this bridge to extend known expressions for the decision boundaries of that test. We also discuss relaxations of the ruin concept in actuarial science, that allow a positive surplus-dependent probability to continue despite temporary negative surplus. Some results on respective modifications of ruin-related quantities and optimal dividend strategies in this new framework are presented.

Dynamic Network Modeling

David Banks
Duke University, U.S.A.

Statistical inference for networks has a long history, but recent technology has driven this field forward very rapidly, in a wide range of scientific disciplines. In particular, much current attention is focused upon building and evaluating models for dynamic networks, in which nodes and edges may change over time. This talk reviews some of the historical development, and then describes applications in dynamic networks for political blogs and baboon troops.

Model calibration and tuning for multi-fidelity computational models

Derek Bingham
Simon Fraser University, Canada

Computer codes are used widely to describe physical processes in lieu of physical observations. In some cases, more than one computer simulator, each with different degrees of fidelity, can be used to explore the physical system. In this work, we combine field observations and model runs from deterministic multi-fidelity computer simulators to build a predictive model for the real process. The resulting model can be used to perform sensitivity analysis for the system and make predictions with associated measures of uncertainty. Our approach is Bayesian and will be illustrated through a simple example, as well as a real application in predictive science at the Center for Radiative Shock Hydrodynamics at the University of Michigan.

A Copula Model for Marked Point Processes

Richard Cook
University of Waterloo, Canada

Many chronic diseases feature recurring clinically important events. In addition, however, a random variable is often realized upon the occurrence of each event, which reflects the severity of the event, a cost associated with it, or possibly a short term response indicating the effect of a therapeutic intervention. We describe a novel model for a marked point process which incorporates a dependence between continuous marks and the event process through the use of a copula function. The copula formulation ensures that event times can be modeled by any intensity function for point processes, and any multivariate model can be specified for the continuous marks. The relative efficiency of joint versus separate analyses of the event times and the marks is examined through simulation under random censoring. An application to data from a motivating clinical trial in transfusion medicine is given for illustration.

Dependence modeling using Pair Copula Contractions (PCC)

Claudia Czado
Technische Universität München, Germany

The copula approach which allows to separate margins from the dependence structure is an often used tool to characterize dependence in multivariate data. While the catalog of bivariate parametric copulas is large and covering symmetric and asymmetric tail dependence this is not the case in higher dimensions. Recently pair copula constructions (PCC) are used to build multivariate copulas which require only two dimensional copulas and conditional distribution functions. The corresponding class of vine copulas has shown to be very flexible. I will introduce this class and discuss estimation and model selection methods. The PCC principle can also be extended to include allow discrete variables and allows to build non Gaussian Bayesian networks. Their applicability will be illustrated involving financial and health data. Available software packages will be mentioned.

Approximate methods of parameter estimation for spatial epidemic models

Rob Deardon
University of Guelph, Canada

Individual-Level Models (ILMs) for infectious disease, fitted in a Bayesian MCMC framework, are an intuitive and flexible class of models that can be used to take into account population heterogeneity via various individual-level covariates. ILMs containing a geometric distance kernel to account for spatial heterogeneity provide a natural way to model the spatial spread of disease. However, in even moderately large populations, the likelihood calculations required can be prohibitively time consuming. It is possible to speed up the computation via techniques that make use of various approximations of the model. Here we examine some methods of carrying out such analyses.

Personalized Medicine and Artificial Intelligence

Michael Kosorok
University of North Carolina, U.S.A.

Personalized medicine is an important and active area of clinical research involving high dimensional data. In this talk, we describe some recent design and methodological developments in clinical trials for discovery and evaluation of personalized medicine. Statistical learning tools from artificial intelligence, including machine learning, reinforcement learning and several newer learning methods, are beginning to play increasingly important roles in these areas. We present illustrative examples of issues and approaches in treatment of depression, cancer, and other diseases. The new approaches have significant potential to improve health and well-being.

Methods for Robust High Dimensional Graphical Model Selection

Bala Rajaratnam
Stanford University, U.S.A.

Learning high dimensional correlation and partial correlation graphical network models is a topic of contemporary interest. A popular approach is to use L1 regularization methods to induce sparsity in the inverse covariance estimator, leading to sparse partial covariance/correlation graphs. Such approaches can be grouped into two classes: (1) regularized likelihood methods and (2) regularized regression-based, or pseudo-likelihood, methods. Regression based methods have the distinct advantage that they do not explicitly assume Gaussianity. One gap in the area is that none of the popular methods proposed for solving regression based objective functions have provable convergence guarantees. Hence it is not clear if resulting estimators actually yield correct partial correlation/partial covariance graphs. To this end, we propose a new regression based graphical model selection method that is both tractable and has provable convergence guarantees. In addition we also demonstrate that our approach yields estimators that have good large sample properties. The methodology is illustrated on both real and simulated data. We also present a novel unifying framework that places various pseudo-likelihood graphical model selection methods as special cases of a more general formulation, leading to important insights.

(Joint work with S. Oh and K. Khare)

Predicting the Present with Bayesian Structural Time Series

Steven L. Scott
Google, U.S.A.

This article describes a system for short term forecasting based on an ensemble prediction that averages over different combinations of predictors. The system combines a structural time series model for the target series with regression component capturing the contributions of contemporaneous search query data. A spike-and-slab prior on the regression coefficients induces sparsity, dramatically reducing the size of the regression problem. Our system averages over potential contributions from a very large set of models and gives easily digested reports of which coefficients are likely to be important. We illustrate with applications to initial claims for unemployment benefits and to retail sales. Although our exposition focuses on using search engine data to forecast economic time series, the underlying statistical methods can be applied to more general short term forecasting with large numbers of contemporaneous predictors.

Multispecialty Physician Networks in Ontario

Thérèse Stukel
ICES and University of Toronto, Canada

Large multispecialty physician group practices, with a central role for primary care, have achieved high quality, low cost care for chronic disease patients. We assessed whether informal multispecialty physician networks could be identified by exploiting natural linkages among patients, physicians, and hospitals based on existing patient flow using health administrative data.

We linked residents to their usual provider of primary care (UPC) over 2008-2010. We linked specialists to the hospital where they performed the most inpatient services. We linked primary care (PC) physicians to the hospital where most of their UPC patient panel was admitted for non-maternal medical care. Residents were linked to the same hospital as their UPC physician. We computed loyalty as the proportion of care to network residents provided by physicians and hospitals within their network. Smaller clusters were aggregated to create networks based on a minimum population size, distance and loyalty. Networks were not constrained geographically.

We identified 78 multispecialty physician networks, comprising 12,581 PC physicians, 14,516 specialists and 175 acute care hospitals serving 12,917,178 people. Median network size was 134,000 residents, 125 PC physicians and 141 specialists. Virtually all eligible residents were linked to a UPC and to a network. Most specialists (94%) and PC physicians (98%) were linked to a hospital. Median network physician loyalty was 68% for physician visits and 81% for PC visits. Median admission loyalty was 67%. Urban networks had lower loyalties and were less self-contained but had more healthcare resources.

In the absence of any formal coordinating structure, these networks have developed naturally through long-standing referral patterns, sharing of information, and admission of patients to the same hospitals. Formal constitution of self-organizing multispecialty physician groups around existing patterns of patient flow could serve as a model for ‘accountable care systems’ that aim to facilitate coordination of care at a local level for high needs patients, as it is aligned with a systems-minded approach to providing long-term chronic disease care and prevention.

Graph-aware measures for comparing partitions

François Théberge
TIMC and University of Ottawa, Canada

The problem of large-scale graph clustering is important for a variety of applications including relational data exploration and visualization, community detection, and partitioning large graphs for cyber defence. An impressive number of graph clustering algorithms have been proposed and studied over the years. While evaluating several of those algorithms for application to our problems, it became clear that the choice of measure(s) used to compare various results can have a huge impact on the conclusions of such experiments.

It is common practice to use set-partition measures on the vertices of the graph to compare partitions, thus neglecting the overall graph structure. In this talk, we review several set-partitioning measures based on information-theory (such as Normalized Mutual Information) or pairwise-counting (such as RAND or the Jaccard measure). All measures run into bias when comparing partitions having different cardinalities, and corrections were proposed for the information-theory based measures, and for the RAND method. We observe that the latter correction can be extended to most pairwise-counting measures. We then propose to use graph-aware measures as alternatives to the standard, set-partition ones. We observe that partitions on graphs can also be seen as binary edge classification, which allows us to define several graph-aware measures. We describe some properties of those measures, and we illustrate the complementarity of set-partition and graph-aware measures via empirical results.

We will also use part of this talk to describe some research areas of interest to the Tutte Institute for Mathematics and Computing (TIMC), and to present various models for collaboration with academia.

An Integrative Bayesian Modeling Approach to Imaging Genetics

Marina Vannucci
Rice University, U.S.A.

In this talk I will present a Bayesian hierarchical modeling approach for imaging genetics, where the interest lies in linking brain connectivity across multiple individuals to their genetic information. I will have available data from a functional magnetic resonance (fMRI) study on schizophrenia. Goals are to identify brain regions of interest (ROIs) with discriminating activation patterns between schizophrenic patients and healthy controls, and to relate the ROIs' activations with available genetic information from single nucleotide polymorphisms (SNPs) on the subjects. For this task I will present a hierarchical mixture model that includes several innovative characteristics: it incorporates the selection of ROIs that discriminate the subjects into separate groups; it allows the mixture components to depend on selected covariates; it includes prior models that capture structural dependencies among the ROIs. Applied to the schizophrenia data set, the model will lead to the simultaneous selection of a set of discriminatory ROIs and the relevant SNPs, together with the reconstruction of the correlation structure of the selected regions.

Modelling Uncertainty of Dependence in Risk Aggregation

Ruodu Wang
University of Waterloo, Canada

The model risk is the risk of inappropriate modelling and misused quantitative methods in financial risk management. One of the most challenging model risk lies in modelling the dependence between individual risks. To give a proper mathematical framework to study the model risk in dependence, we introduce the admissible risk class as the set of all possible risk aggregation when the marginal distributions of individual risks are fixed but the dependence structure among them is unspecified. The question arises from a statistical challenge - we usually do not practically know how the individual risks are dependent. The concept provides flexibility for the analysis of model risk of dependence. We will also give theoretical results on convex ordering bounds over an admissible risk class, which can be used to identify extreme scenarios for risk aggregation and calculate bounds on convex risk measures and other quantities of interest, such as expected utilities, stop-loss premiums, prices of European options and TVaR.

Groups and Aggregates to Networks: Implications for Community, Family and Work

Barry Wellman
University of Toronto, Canada

I discuss the triple revolution: the turn to social networks, and away from groups and individuals; the Internet; mobile connectivity. I focus on how analyzing people and organizations in terms of networks provides a good understanding of community, family and work.

Some current aspects of aggregate loss and related analysis

Gordon Willmot
University of Waterloo, Canada

This talk will focus on the use of the random sum model of aggregate claims on a portfolio of insurance business. Insurance quantities of interest include the stop-loss premium, mean residual lifetime, and Tail-Value-at-Risk (TVaR). Semi-parametric ideas have recently become popular in order to capitalize on computational resources, while still retaining analytic properties of interest. The use of recursive techniques and of mixed Erlang models are two examples of this trend. The use of Poisson and mixed Poisson claim count models implies direct relevance in an insurance setting of such probabilistic notions as infinite divisibility, thinning, and self-decomposability. Applications of these random sum models to inflationary settings and claim payment delays are discussed. Finally, some comments about claim count models are made, and in particular a generalized class of counting distributions is seen to have applicability to discrete stop-loss and TVaR analysis.