Estimation of the expected shortfall given an extreme component under conditional extreme value model
For two risks, $X$, and $Y$ , the Marginal Expected Shortfall (MES) is defined as $E[Y \mid X > x]$, where $x$ is large. MES is an important factor when measuring the systemic risk of financial institutions. In this talk we will discuss consistency and asymptotic normality of an estimator of MES on assuming that $(X, Y)$ follows a Conditional Extreme Value (CEV) model. The theoretical findings are supported by simulation studies. Our procedure is applied to some financial data. This is a joint work with Kevin Tong (Bank of Montreal).
Analysis of Clinical Trials with Multiple Outcomes
In order to obtain better overall knowledge of a treatment effect, investigators in clinical trials often collect many medically related outcomes, which are commonly called as endpoints. It is fundamental to understand the objectives of a particular analysis before applying any adjustment for multiplicity. For example, multiplicity does not always lead to error rate inflation, or multiplicity may be introduced for purpose other than making an efficacy or safety claim such as in sensitivity assessments. Sometimes, the multiple endpoints in clinical trials can be hierarchically ordered and logically related. In this talk, we will discuss the methods to analyze multiple outcomes in clinical trials with different objectives: all or none approach, global approach, composite endpoint, at-least-one approach.
Data Adaptive Support Vector Machine with Application to Prostate Cancer Imaging Data
Support vector machines (SVM) have been widely used as classifiers in various settings including pattern recognition, texture mining and image retrieval. However, such methods are faced with newly emerging challenges such as imbalanced observations and noise data. In this talk, I will discuss the impact of noise data and imbalanced observations on SVM classification and present a new data adaptive SVM classification method.
This work is motivated by a prostate cancer imaging study conducted in London Health Science Center. A primary objective of this study is to improve prostate cancer diagnosis and thereby to guide the treatment based on statistical predictive models. The prostate imaging data, however, are quite imbalanced in that the majority voxels are cancer-free while only a very small portion of voxels are cancerous. This issue makes the available SVM classifiers typically skew to one class and thus generate invalid results. Our proposed SVM method uses a data adaptive kernel to reflect the feature of imbalanced observations; the proposed method takes into consideration of the location of support vectors in the feature space and thereby generates more accurate classification results. The performance of the proposed method is compared with existing methods using numerical studies.
A new framework of calibration for computer models: parameterization and efficient estimation
In this talk I will show some theoretical advances on the problem of calibration for computer models. The goal of calibration is to identify the model parameters in deterministic computer experiments, which cannot be measured or are not available in physical experiments. A theoretical framework is given which enables the study of parameter identifiability and estimation. In a study of the prevailing Bayesian method proposed by Kennedy and O’Hagan (2001), Tuo-Wu (2015, 2016) and Tuo-Wang-Wu (2017) find that this method may render unreasonable estimation for the calibration parameters. A novel calibration method, called L2 calibration, is proposed and proven to enjoy nice asymptotic properties, including asymptotic normality and semi-parametric efficiency. Inspired by a new advance in Gaussian process modeling, called orthogonal Gaussian process models (Plumlee and Joseph, 2016, Plumlee 2016), I have proposed another methodology for calibration. This new method is proven to be semi-parametric efficient, and in addition it allows for a simple Bayesian version so that Bayesian uncertainty quantification can be carried out computationally. In some sense, this latest work provides a complete solution to a long-standing problem in uncertainty quantification (UQ).
Detecting Change in Dynamic Networks
Dynamic networks are often used to model the communications, interactions, or relational structure, of a group of individuals through time. In many applications, it is of interest to identify instances or periods of unusual levels of interaction among these individuals. The real-time monitoring of networks for anomalous changes is known as network surveillance.
This talk will provide an overview of the network surveillance problem and propose a network monitoring strategy that applies statistical process monitoring techniques to the estimated parameters of a degree corrected stochastic block model to identify significant structural change. For illustration, the proposed methodology will be applied to a dynamic U.S. Senate co-voting network as well as the Enron email exchange network. Several ongoing and open research problems will also be discussed.
Pricing Bounds and Bang-bang Analysis of the Polaris Variable Annuities
In this talk, I will discuss the no-arbitrage pricing of the “Polaris Income Plus Daily” structured in the “Polaris Choice IV” variable annuities recently issued by the American International Group. Distinguished from most withdrawal benefits in the literature, Polaris allows the income base to “lock in” the high water mark of the investment account over certain monitoring period, which is related to the timing of policyholder’s first withdrawal. By prudently introducing certain auxiliary state and control variables, we manage to formulate the pricing model under a Markovian stochastic optimal control framework. For the rider charge proportional to the investment account, we establish a bang-bang solution for the optimal withdrawal strategies and show that they can only be among a few explicit choices. We consequently design a novel Least Square Monte Carlo (LSMC) algorithm for the optimal solution. Interesting convergence results are established for the algorithm by applying certain theory of nonparametric sieve estimation. Finally, we formally prove that the pricing results obtained under the ride charge proportional to the investment account works as an upper bound of a contract with insurance fees charged on the income base instead. Numerical studies show the superior performance of the pricing bounds. This talk is based on a joint work with Prof. Chengguo Weng at University of Waterloo.
Latent variable modeling: from functional data analysis to cancer genomics
Many important research questions can be answered by incorporating latent variables into the data analysis. However, this type of modelling requires the development of sophisticated methods and often computational tricks in order to make the inference problem more tractable. In this talk I present an overview of latent variable modelling and show how I have developed different latent variable techniques for several data analyses, two in functional data analysis and one in cancer genomics.
Causal inference in observational data with unmeasured confounding
Observational data introduces many practical challenges for causal inference. In this talk, I will focus on a particular issue when there are unobserved confounders such that the assumption of “ignorability” is violated. For making a causal inference in the presence of unmeasured confounders, instrumental variable (IV) analysis plays a crucial role. I will introduce a hierarchical Bayesian likelihood-based IV analysis under a Latent Index Modeling framework to jointly model outcomes and treatment status, along with necessary assumptions and sensitivity analysis to make a valid causal inference. The innovation in our methodology is an extension of existing parametric approach by i.) accounting for an unobserved heterogeneity via a latent factor structure, and ii.) allowing non-parametric error distributions with Dirichlet process mixture models. We demonstrate utility of our model in comparing effectiveness of two different types of vascular access for a cardio-vascular procedure.
A Group-Specific Recommender System
In recent years, there has been a growing demand to develop efficient recommender systems which track users’ preferences and recommend potential items of interest to users. In this article, we propose a group-specific method to use dependency information from users and items which share similar characteristics under the singular value decomposition framework. The new approach is effective for the “cold-start” problem, where, in the testing set, majority responses are obtained from new users or for new items, and their preference information is not available from the training set. One advantage of the proposed model is that we are able to incorporate information from the missing mechanism and group-specific features through clustering based on the numbers of ratings from each user and other variables associated with missing patterns. In addition, since this type of data involves large-scale customer records, traditional algorithms are not computationally scalable. To implement the proposed method, we propose a new algorithm that embeds a back-fitting algorithm into alternating least squares, which avoids large matrices operation and big memory storage, and therefore makes it feasible to achieve scalable computing. Our simulation studies and MovieLens data analysis both indicate that the proposed group-specific method improves prediction accuracy significantly compared to existing competitive recommender system approaches.
Integrative Reciprocal Graphical Models with Heterogeneous Samples
In this talk, I will introduce novel hierarchical reciprocal graphical models to infer gene networks by integrating genomic data across platforms and across diseases. The proposed model takes into account tumor heterogeneity. In the case of data that can be naturally divided into known groups, we propose to connect graphs by introducing a hierarchical prior across group-specific graphs, including a correlation on edge strengths across graphs. Thresholding priors are applied to induce sparsity of the estimated networks. In the case of unknown groups, we cluster subjects into subpopulations and jointly estimate cluster-specific gene networks, again using similar hierarchical priors across clusters. Two applications with multiplatform genomic data for multiple cancers will be presented to illustrate the utility of our model. I will also briefly discuss my other work and future directions.