Friday, December 1, 2017 — 12:00 PM EST

Wednesday, November 22, 2017 — 9:00 AM EST

Integrative Reciprocal Graphical  Models with Heterogeneous Samples

In this talk, I will introduce novel hierarchical reciprocal graphical models to infer gene networks by integrating genomic data across platforms and across diseases. The proposed model takes into account tumor heterogeneity. In the case of data that can be naturally divided into known groups, we propose to connect graphs by introducing a hierarchical prior across group-specific graphs, including a correlation on edge strengths across graphs. Thresholding priors are applied to induce sparsity of the estimated networks. In the case of unknown groups, we cluster subjects into subpopulations and jointly estimate cluster-specific gene networks, again using similar hierarchical priors across clusters. Two applications with multiplatform genomic data for multiple cancers will be presented to illustrate the utility of our model. I will also briefly discuss my other work and future directions. 

Friday, November 17, 2017 — 9:00 AM EST

A Group-Specific Recommender System

In recent years, there has been a growing demand to develop efficient recommender systems which track users’ preferences and recommend potential items of interest to users. In this article, we propose a group-specific method to use dependency information from users and items which share similar characteristics under the singular value decomposition framework. The new approach is effective for the “cold-start” problem, where, in the testing set, majority responses are obtained from new users or for new items, and their preference information is not available from the training set. One advantage of the proposed model is that we are able to incorporate information from the missing mechanism and group-specific features through clustering based on the numbers of ratings from each user and other variables associated with missing patterns. In addition, since this type of data involves large-scale customer records, traditional algorithms are not computationally scalable. To implement the proposed method, we propose a new algorithm that embeds a back-fitting algorithm into alternating least squares, which avoids large matrices operation and big memory storage, and therefore makes it feasible to achieve scalable computing. Our simulation studies and MovieLens data analysis both indicate that the proposed group-specific method improves prediction accuracy significantly compared to existing competitive recommender system approaches.

Wednesday, November 15, 2017 — 9:00 AM EST

Causal inference in observational data with unmeasured confounding

Observational data introduces many practical challenges for causal inference. In this talk, I will focus on a particular issue when there are unobserved confounders such that the assumption of “ignorability” is violated. For making a causal inference in the presence of unmeasured confounders, instrumental variable (IV) analysis plays a crucial role. I will introduce a hierarchical Bayesian likelihood-based IV analysis under a Latent Index Modeling framework to jointly model outcomes and treatment status, along with necessary assumptions and sensitivity analysis to make a valid causal inference. The innovation in our methodology is an extension of existing parametric approach by i.) accounting for an unobserved heterogeneity via a latent factor structure, and ii.) allowing non-parametric error distributions with Dirichlet process mixture models. We demonstrate utility of our model in comparing effectiveness of two different types of vascular access for a cardio-vascular procedure.

Friday, November 10, 2017 — 9:00 AM EST

Latent variable modeling: from functional data analysis to cancer genomics

Many important research questions can be answered by incorporating latent variables into the data analysis.  However, this type of modelling requires the development of sophisticated methods and often computational tricks in order to make the inference problem more tractable. In this talk I present an overview of latent variable modelling and show how I have developed different latent variable techniques for several data analyses, two in functional data analysis and one in cancer genomics.

Tuesday, November 7, 2017 — 4:00 PM EST

Pricing Bounds and Bang-bang Analysis of the Polaris Variable Annuities

In this talk, I will discuss the no-arbitrage pricing of the “Polaris Income Plus Daily” structured in the “Polaris Choice IV” variable annuities recently issued by the American International Group. Distinguished from most withdrawal benefits in the literature, Polaris allows the income base to “lock in” the high water mark of the investment account over certain monitoring period, which is related to the timing of policyholder’s first withdrawal. By prudently introducing certain auxiliary state and control variables, we manage to formulate the pricing model under a Markovian stochastic optimal control framework. For the rider charge proportional to the investment account, we establish a bang-bang solution for the optimal withdrawal strategies and show that they can only be among a few explicit choices. We consequently design a novel Least Square Monte Carlo (LSMC) algorithm for the optimal solution. Interesting convergence results are established for the algorithm by applying certain theory of nonparametric sieve estimation. Finally, we formally prove that the pricing results obtained under the ride charge proportional to the investment account works as an upper bound of a contract with insurance fees charged on the income base instead. Numerical studies show the superior performance of the pricing bounds. This talk is based on a joint work with Prof. Chengguo Weng at University of Waterloo.

Friday, November 3, 2017 — 9:00 AM EDT

Detecting Change in Dynamic Networks

Dynamic networks are often used to model the communications, interactions, or relational structure, of a group of individuals through time. In many applications, it is of interest to identify instances or periods of unusual levels of interaction among these individuals. The real-time monitoring of networks for anomalous changes is known as network surveillance.

This talk will provide an overview of the network surveillance problem and propose a network monitoring strategy that applies statistical process monitoring techniques to the estimated parameters of a degree corrected stochastic block model to identify significant structural change. For illustration, the proposed methodology will be applied to a dynamic U.S. Senate co-voting network as well as the Enron email exchange network. Several ongoing and open research problems will also be discussed.

Wednesday, November 1, 2017 — 9:00 AM EDT

A new framework of calibration for computer models: parameterization and efficient estimation

In this talk I will show some theoretical advances on the problem of calibration for computer models. The goal of calibration is to identify the model parameters in deterministic computer experiments, which cannot be measured or are not available in physical experiments. A theoretical framework is given which enables the study of parameter identifiability and estimation. In a study of the prevailing Bayesian method proposed by Kennedy and O’Hagan (2001), Tuo-Wu (2015, 2016) and Tuo-Wang-Wu (2017) find that this method may render unreasonable estimation for the calibration parameters. A novel calibration method, called L2 calibration, is proposed and proven to enjoy nice asymptotic properties, including asymptotic normality and semi-parametric efficiency. Inspired by a new advance in Gaussian process modeling, called orthogonal Gaussian process models (Plumlee and Joseph, 2016, Plumlee 2016), I have proposed another methodology for calibration. This new method is proven to be semi-parametric efficient, and in addition it allows for a simple Bayesian version so that Bayesian uncertainty quantification can be carried out computationally. In some sense, this latest work provides a complete solution to a long-standing problem in uncertainty quantification (UQ).

Tuesday, October 31, 2017 — 1:00 PM EDT

Data Adaptive Support Vector Machine with Application to Prostate Cancer Imaging Data

Support vector machines (SVM) have been widely used as classifiers in various settings including pattern recognition, texture mining and image retrieval. However, such methods are faced with newly emerging challenges such as imbalanced observations and noise data. In this talk, I will discuss the impact of noise data and imbalanced observations on SVM classification and present a new data adaptive SVM classification method.

This work is motivated by a prostate cancer imaging study conducted in London Health Science Center. A primary objective of this study is to improve prostate cancer diagnosis and thereby to guide the treatment based on statistical predictive models. The prostate imaging data, however, are quite imbalanced in that the majority voxels are cancer-free while only a very small portion of voxels are cancerous. This issue makes the available SVM classifiers typically skew to one class and thus generate invalid results. Our proposed SVM method uses a data adaptive kernel to reflect the feature of imbalanced observations; the proposed method takes into consideration of the location of support vectors in the feature space and thereby generates more accurate classification results. The performance of the proposed method is compared with existing methods using numerical studies.

Monday, October 30, 2017 — 4:00 PM EDT

Analysis of Clinical Trials with Multiple Outcomes

In order to obtain better overall knowledge of a treatment effect, investigators in clinical trials often collect many medically related outcomes, which are commonly called as endpoints. It is fundamental to understand the objectives of a particular analysis before applying any adjustment for multiplicity. For example, multiplicity does not always lead to error rate inflation, or multiplicity may be introduced for purpose other than making an efficacy or safety claim such as in sensitivity assessments. Sometimes, the multiple endpoints in clinical trials can be hierarchically ordered and logically related. In this talk, we will discuss the methods to analyze multiple outcomes in clinical trials with different objectives:  all or none approach, global approach, composite endpoint, at-least-one approach.

Thursday, October 26, 2017 — 4:00 PM EDT

Estimation of the expected shortfall given an extreme component under conditional extreme value model

For two risks, $X$, and $Y$ , the Marginal Expected Shortfall (MES) is defined as $E[Y \mid  X > x]$, where $x$ is large. MES is an important factor when measuring the systemic risk of financial institutions. In this talk we will discuss consistency and asymptotic normality of an estimator of MES on assuming that $(X, Y)$ follows a Conditional Extreme Value (CEV) model. The theoretical findings are supported by simulation studies. Our procedure is applied to some financial data. This is a joint work with Kevin Tong (Bank of Montreal).

Thursday, October 12, 2017 — 4:00 PM EDT

Optimal Insurance: Belief Heterogeneity, Ambiguity, and Arrow's Theorem

In Arrow's classical problem of demand for insurance indemnity schedules, it is well-known that the optimal insurance indemnification for an insurance buyer (the insured) is a straight deductible contract, when the insurer is a risk-neutral Expected Utility (EU) maximizer and when the insured is a risk-averse EU maximizer. In Arrow's framework, however, the two parties share the same probabilistic beliefs about the realizations of the underlying insurable loss, and neither party experiences ambiguity (Knightian uncertainty) about the distribution of this random loss. In this talk, I will discuss extensions of Arrow's classical result to situations of belief heterogeneity and ambiguity.

Thursday, October 5, 2017 — 4:00 PM EDT

Statistically and Numerically Efficient Independence Test

We study how to generate a statistical inference procedure that is both computational efficient and having theoretical guarantee on its statistical performance. Test of independence plays a fundamental role in many statistical techniques. Among the nonparametric approaches, the distance-based methods (such as the distance correlation based hypotheses testing for independence) have numerous advantages, comparing with many other alternatives. A known limitation of the distance-based method is that its computational complexity can be high. In general, when the sample size is n, the order of computational complexity of a distance-based method, which typically requires computing of all pairwise distances, can be O(n^2). Recent advances have discovered that in the univariate cases, a fast method with O(n log n) computational complexity and O(n) memory requirement exists. In this talk, I introduce a test of independence method based on random projection and distance correlation, which achieves nearly the same power as the state-of-the-art distance-based approach, works in the multivariate cases, and enjoys the O(n K log n) computational complexity and O(max{n,K}) memory requirement, where K is the number of random projections. Note that saving is achieved when K < n/ log n. We name our method a Randomly Projected Distance Covariance (RPDC). The statistical theoretical analysis takes advantage of some techniques on random projection, which are rooted in contemporary machine learning. Numerical experiments demonstrate the efficiency of the proposed method, in relative to several competitors.

Thursday, September 28, 2017 — 4:00 PM EDT
Susan Murphy

Challenges in Developing Learning Algorithms to Personalize Treatment in Real Time

A formidable challenge in designing sequential treatments is to  determine when and in which context it is best to deliver treatments.  Consider treatment for individuals struggling with chronic health conditions.  Operationally designing the sequential treatments involves the construction of decision rules that input current context of an individual and output a recommended treatment.   That is, the treatment is adapted to the individual's context; the context may include  current health status, current level of social support and current level of adherence for example.  Data sets on individuals with records of time-varying context and treatment delivery can be used to inform the construction of the decision rules.    There is much interest in personalizing the decision rules, particularly in real time as the individual experiences sequences of treatment.   Here we discuss our work in designing  online "bandit" learning algorithms for use in personalizing mobile health interventions. 

Tuesday, September 26, 2017 — 4:00 PM EDT

Owning Your Research: Things I wish I knew for graduate study

Pursuing graduate study is a courageous decision for life that takes time, effort, and commitment. Graduate study can be mysterious at the beginning, miserable in the process, and marvelous at the end. As a recent grad, I am going to share some of my graduate study experiences in this presentation. In particular, I hope to give some advice to current graduate students to make their graduate study easier, faster, and happier. Among other recommendations, I will provide some tricks and tips on coding that can improve research productivity. 

Thursday, September 21, 2017 — 4:00 PM EDT

Empirical balancing scores and balancing weights

Propensity scores have been central to causal inference and are often used as balancing scores or balancing weights. Estimated propensity scores, however, may exhibit undesirable finite-sample performance. We take a step back to understand what properties of balancing scores and weights are desirable. For balancing scores, the dimension reduction aspect is important; whereas for balancing weights, a conditional moment balancing property is crucial. Based on these considerations, a joint sufficient dimension reduction framework is proposed for balancing scores, and a covariate functional balancing framework is proposed for balancing weights. 

Monday, September 18, 2017 — 4:00 PM EDT

Multivariate Quantiles: Nonparametric Estimation and Applications to Risk Management

In many applications of hydrology, quantiles provide important insights in the statistical problems considered. In this talk, we focus on the estimation of a notion of multivariate quantiles based on copulas and provide a nonparametric estimation procedure. These quantiles are based on particular level sets of copulas and admit the usual probabilistic interpretation that a p-quantile comprises a probability mass p. We also explore the usefulness of a smoothed bootstrap in the estimation process. Our simulation results show that the nonparametric estimation procedure yields excellent results in finite samples and that the smoothed bootstrap can be beneficially applied.

Thursday, September 14, 2017 — 4:00 PM EDT

Variable selection for case-cohort studies with failure time outcome

Case-cohort designs are widely used in large cohort studies to reduce the cost associated with covariate measurement. In many such studies the number of covariates is very large, so an efficient variable selection method is necessary. We investigated the properties of a variable selection procedure using the smoothly clipped absolute deviation penalty in a case-cohort design with a diverging number of parameters. We establish the consistency and asymptotic normality of the maximum penalized pseudo-partial-likelihood estimator, and show that the proposed variable selection method is consistent and has an asymptotic oracle property. Simulation studies compare the finite-sample performance of the procedure with tuning parameter selection methods based on the Akaike information criterion and the Bayesian information criterion. We make recommendations for use of the proposed procedures in case-cohort studies, and apply them to the Busselton Health Study.

Friday, September 8, 2017 — 10:00 AM EDT

Clustering Heavy-Tailed Stable Data

Tuesday, September 5, 2017 — 4:00 PM EDT

Why risk is so hard to measure?

This paper analyzes the reliability of standard approaches for financial risk analysis. We focus on the difference between Value-at-Risk and Expected Shortfall, their small sample properties, the scope for underreporting risk and how estimation can be improved. Overall, we find that risk forecasts are extremely uncertain at low sample sizes, with Value-at-Risk more accurate than Expected Shortfall. Value-at-Risk is easily deliberately underreported without violating regulations and control mechanisms. Finally, we discuss the implications for academic research, practitioners and regulators, along with best practice suggestions.

Wednesday, August 23, 2017 — 2:00 PM EDT

Replacing the Replacement Rate: How Much is "ENOUGH" Retirement Income?

For years, the standard for measuring retirement income adequacy has been the final earnings replacement rate (usually targeted at 70%). Financial planners, actuaries, pension plan advisors, academics and public policy analysts all use this benchmark. It’s the measure that underlies our pension systems, drives the research that determines whether populations are prepared (or not) for retirement and serves as the backbone of retirement planning software.

But the question is, does it work? Will 70% of a worker’s final annual employment earnings actually sustain his or her living standards after retirement?

This presentation examines whether workers who hit this target actually can expect to maintain their living standards in retirement.  Bonnie-Jeanne will also discuss an alternative, more accurate, basis for assessing how well a worker’s living standards are maintained after retirement - the Living Standards Replacement Rate.

Tuesday, July 11, 2017 — 10:30 AM EDT

Causal Inference by Compression

Monday, June 19, 2017 — 4:00 PM EDT

Risk Measures in a Quantile Regression Credibility Framework

Here, we extend the idea of embedding the classical credibility model into risk measures, as was presented by Pitselis (2016), to the idea of embedding regression credibility into risk measures. The resulting credible regression risk measures capture the risk of individual insurer's contract (in finance, the individual asset return portfolio) as well as the portfolio risk consisting of several similar but not identical contracts (in finance, several similar portfolios of asset returns),  which are grouped together to share the risk. In insurance, credibility plays a special role of spreading the risk. In financial terminology, credibility plays a special role of diversification of risk. For each model, regression credibility models are established  and  the robustness of these models is investigated. Applications to Fama/French financial portfolio data are also presented.

Thursday, June 15, 2017 — 4:00 PM EDT

Change point detection in functional time series models for yield curves

Yield curves are functions defined on  time to maturity with corresponding values equal to yield (interest) on a bond, typically a standardized government issued instrument. Yield curves are commonly used to predict future states of the economy on the basis of the interest investors demand for government debt of various maturities. These curves form a time series of functions, one function per day. The talk will discuss methods of detecting a change point in the mean function of such a  functional time series. After reviewing related  research, we will present two methods: one which uses a factor representation of the yield curves, the other a fully nonparametric method. Both methods permit the second order structure to change independently of the changes in the mean structure. Based on the asymptotic theory, two numerical approaches to the implementation of the tests will be presented and compared. The methodology will be illustrated by a simulation study and an application to US Federal Reserve yield curves. 

Thursday, May 25, 2017 — 4:00 PM EDT

‘Tweaking’ variables to make them uncorrelated


  1. 2017 (49)
    1. December (1)
    2. November (7)
    3. October (5)
    4. September (7)
    5. August (1)
    6. July (1)
    7. June (2)
    8. May (4)
    9. April (2)
    10. March (3)
    11. February (4)
    12. January (12)
  2. 2016 (44)
    1. December (2)
    2. November (4)
    3. October (4)
    4. September (5)
    5. August (2)
    6. June (5)
    7. May (3)
    8. April (1)
    9. March (5)
    10. February (3)
    11. January (10)
  3. 2015 (38)
  4. 2014 (44)
  5. 2013 (46)
  6. 2012 (44)