Wednesday, January 31, 2018 — 10:30 AM EST

**Modern Classification with Big Data**

Rapid advances in information technologies have ushered in the era of "big data" and revolutionized the scientific research. Big data creates golden opportunities but has also arisen unprecedented challenges due to the massive size and complex structure of the data. Among many tasks in statistics and machine learning, classification has diverse applications, ranging from improving daily life to reaching the new frontiers of science and engineering. This talk will discuss the envisions of broader approaches to modern classification methodologies, as well as computational considerations to cope with the big data challenges. I will present a modern classification method named data-driven generalized distance-weighted discrimination. A fast algorithm with an emphasis on computational efficiency for big data will be introduced. Our method is formulated in a reproducing kernel Hilbert space, and learning theory of the Bayes risk consistency will be developed. In addition, I will use extensive benchmark data applications to demonstrate that the prediction accuracy of our method is highly competitive with state-of-the-art classification methods including support vector machine, random forest, gradient boosting, and deep neural network.

Monday, January 29, 2018 — 10:10 AM EST

This seminar has been cancelled.

Thank you.

Friday, January 26, 2018 — 10:30 AM EST

**Inference for statistical interactions under misspecified or high-dimentional main effects**

An increasing number multi-omic studies have generated complex high-dimentional data. A primary focus of these studies is to determine whether exposures interact in the effect that they produce on an outcome of interest. Interaction is commonly assessed by fitting regression models in which the linear predictor includes the product between those exposures. When the main interest lies in interactions, the standard approach is not satisfactory because it is prone to (possibly severe) type I error inflation when the main exposure effects are misspecified or high-dimentional. I will propose generalized score type tests for high-dimentional interaction effects on correlated outcomes. I will also discuss the theoretical justification of some empirical observations regarding Type I error control, and introduce solutions to achieve robust inference for statistical interactions. The proposed methods will be illustrated using an example from the Multi-Ethnic Study of Atherosclerosis (MESA), investigating interaction between measures of neighborhood environment and genetic regions on longitudinal measures of blood pressure over a study period of about seven years with four exams.

Wednesday, January 24, 2018 — 10:30 AM EST

**Parametric and Nonparametric Models for Higher-order Interactions.**

In this talk, I will discuss about parametric and nonparametric models for higher-order interactions with a focus on the statistical and computational aspects. In fields like social, political and biological sciences, there is a clear need for analyzing higher-order interactions as opposed to pairwise interactions, which has been the main focus of statistical network analysis recently. Generalized Block Models and Hypergraphons are powerful tools for modeling higher-order interactions. The talk will introduce the models, present theoretical results highlighting the challenges and differences that arise when analyzing higher-order interactions compared to pairwise interactions, and discuss applications and numerical results.

Monday, January 22, 2018 — 10:30 AM EST

**Nonparametric Inference for Sensitivity of Haezendonck-Goovaerts Risk Measure**

Recently Haezendonck-Goovaerts (H-G) risk measure has been popular in actuarial science. When it is applied to an insurance or a financial portfolio with several loss variables, sensitivity analysis becomes useful in managing the portfolio, and the assumption of independent observations may not be reasonable. This paper first derives an expression for computing the sensitivity of the H-G risk measure, which enables us to estimate the sensitivity nonparametrically via the H-G risk measure. Further, we derive the asymptotic distributions of the nonparametric estimators for the H-G risk measure and the sensitivity by assuming that loss variables in the portfolio follow from a strictly stationary ↵-mixing sequence. A simulation study is provided to examine the finite sample performance of the proposed nonparametric estimators. Finally, the method is applied to a real data set. Key words and phrases: Asymptotic distribution, Haezendonck-Goovaerts risk measure, Mixing sequence, Nonparametric estimate, Sensitivity analysis

Friday, January 19, 2018 — 1:30 PM EST

**Competitive Equilibria in a Comonotone Market**

The notion of competitive equilibria has been a crucial consideration in risk sharing problems. A large literature is devoted to analyses of optimal risk sharing based on expected utilities in a complete market. In this work, we investigate the competitive equilibria in a special type of incomplete markets, referred to as a comonotone market, where agents can only trade such that their wealth allocation is comonotonic. The comonotone market is motivated by two seemingly unrelated observations. First, in a complete market, under mild conditions on the preferences, an equilibrium allocation is generally comonotonic. Second, in a standard insurance market, the allocation of risk among the insured, the insurer and the reinsurers is assumed to be comonotonic a priori to the risk-exchange. Two popular classes of preferences in risk management and behavioural economics, dual utilities (DU) and rank-dependent expected utilities (RDU), are used to formulate agents' objectives. We focus on establishing a pair of an equilibrium wealth allocation and an equilibrium pricing measure. For DU-comonotone markets, we nd the equilibrium in closed-form. We further propose an algorithm to numerically obtain a competitive equilibria based on discretization, which works for both the DU-comonotone market and the RDU-comonotone market. Results illustrate the intriguing and possibly puzzling fact that the equilibrium pricing kernel may not be counter-comonotone with the aggregate risk, in sharp contrast to the case of a complete market.

Monday, January 15, 2018 — 10:30 AM EST

**Community Estimation on Weighted Networks**

Community identification in a network is an important problem in fields such as social science, neuroscience, and genetics. Over the past decade, stochastic block models (SBMs) have emerged as a popular statistical framework for this problem. However, SBMs have an important limitation in that they are suited only for networks with unweighted edges; disregarding the edge weights may result in a loss of valuable information in various scientific applications. We propose a weighted generalization of the SBM where we model the probability distribution of the edge weights as a mixture whose latent components reflect the latent community structure of the network. In this model, observations comprise of a weighted adjacency matrix where the weight of each edge is generated independently from one of two unknown probability densities depending on whether the edge is within-community or between-community. We characterize the optimal rate of mis-clustering error of the weighted SBM in terms of the Renyi divergence between the probability distributions of within-community and between-community edges, substantially generalizing existing results for unweighted SBMs. Furthermore, we present a computationally tractable algorithm based on discretization that is adaptive to the unknown edge weight densities in the sense that it achieves the same optimal error rate as if it had perfect knowledge of the edge weight densities.

Friday, January 12, 2018 — 1:30 PM EST

**Testing the multivariate regular variation model for extreme risks**

Heavy-tail phenomena generally exist in insurance, finance and economics. Multivariate regular variation (MRV) is one of the most important structures in modeling multivariate extreme risks with heavy-tailed marginal distributions and flexible dependence structures. In this paper, we propose a formal goodness-of-fit test for the MRV model. The test is based on comparing the tail indices of the radial component conditional on the angular component falling in different subsets. We first establish the estimator of the conditional tail index and prove the joint asymptotic property for all such estimators. We further combine the test on the constancy across different conditional tail indices with testing the regular variation of the radial component. Our proofs are based on the asymptotic properties of tail and non-tail empirical processes. Simulation studies demonstrate the good performance of the proposed tests, and real market data applications are also provided.

Wednesday, January 10, 2018 — 10:30 AM EST

**Sparse Estimation for Functional Semiparametric Additive Models**

In the context of functional data analysis, functional linear regression serves as a fundamental tool to handle the relationship between a scalar response and a functional covariate. With the aid of Karhunen–Loève expansion of a stochastic process, a functional linear model can be written as an infinite linear combination of functional principal component scores. A reduced form is fitted in practice for dimension reduction; it is essentially converted to a multiple linear regression model.

Though the functional linear model is easy to implement and interpret in applications, it may suffer from an inadequate fit due to this specific linear representation. Additionally, effects of scalar predictors which may be predictive of the scalar response are neglected in the functional linear model.

Prediction accuracy can be enhanced greatly by incorporating effects of these scalar predictors.

In this talk, we propose a functional semiparametric additive model, which models the effect of a functional covariate nonparametrically and models several scalar covariates in a linear form. We develop the method for estimating the functional semiparametric additive model by smoothing and selecting non-vanishing components for the functional covariate. We show that the estimation method can consistently estimate both nonparametric and parametric parts in the model. Numerical studies will be presented to demonstrate the advantage of the proposed model in prediction.

Friday, January 5, 2018 — 10:30 AM EST

**Statistical Methods for The Analysis of Censored Family Data under Biased Sampling Schemes**

Studies of the genetic basis for chronic disease often first aim to examine the nature and extend of within-family dependence in disease status. Families for such studies are typically selected using a biased sampling scheme in which affected individuals are recruited from a disease registry, followed by their consenting relatives. This gives right-censored or current status information on disease onset times. Methods for correcting this response-dependent sampling scheme have been developed for correlated binary data but variation in the age of assessment for family members makes this analysis uninterpretable. We develop likelihood and composite likelihood methods for modeling within-family associations in disease onset time using copula functions and second-order regression models in which dependencies are characterized by Kendall’s τ. Auxiliary data from an independent sample of individuals can be integrated by augmenting the composite likelihood to ensure identifiability and increase efficiency. An application to a motivating family study in psoriatic arthritis illustrates the method and provides evidence of excessive paternal transmission of risk. Ongoing work on the use of second-order estimating functions, alternative framework for dependence modeling, and approaches to efficient study design will also be discussed.