Tuesday, October 31, 2017 — 1:00 PM EDT

**Data Adaptive Support Vector Machine with Application to Prostate Cancer Imaging Data**

Support vector machines (SVM) have been widely used as classifiers in various settings including pattern recognition, texture mining and image retrieval. However, such methods are faced with newly emerging challenges such as imbalanced observations and noise data. In this talk, I will discuss the impact of noise data and imbalanced observations on SVM classification and present a new data adaptive SVM classification method.

This work is motivated by a prostate cancer imaging study conducted in London Health Science Center. A primary objective of this study is to improve prostate cancer diagnosis and thereby to guide the treatment based on statistical predictive models. The prostate imaging data, however, are quite imbalanced in that the majority voxels are cancer-free while only a very small portion of voxels are cancerous. This issue makes the available SVM classifiers typically skew to one class and thus generate invalid results. Our proposed SVM method uses a data adaptive kernel to reflect the feature of imbalanced observations; the proposed method takes into consideration of the location of support vectors in the feature space and thereby generates more accurate classification results. The performance of the proposed method is compared with existing methods using numerical studies.

Monday, October 30, 2017 — 4:00 PM EDT

**Analysis of Clinical Trials with Multiple Outcomes**

In order to obtain better overall knowledge of a treatment effect, investigators in clinical trials often collect many medically related outcomes, which are commonly called as endpoints. It is fundamental to understand the objectives of a particular analysis before applying any adjustment for multiplicity. For example, multiplicity does not always lead to error rate inflation, or multiplicity may be introduced for purpose other than making an efficacy or safety claim such as in sensitivity assessments. Sometimes, the multiple endpoints in clinical trials can be hierarchically ordered and logically related. In this talk, we will discuss the methods to analyze multiple outcomes in clinical trials with different objectives: all or none approach, global approach, composite endpoint, at-least-one approach.

Thursday, October 26, 2017 — 4:00 PM EDT

**Estimation of the expected shortfall given an extreme component under conditional extreme value model**

For two risks, $X$, and $Y$ , the Marginal Expected Shortfall (MES) is defined as $E[Y \mid X > x]$, where $x$ is large. MES is an important factor when measuring the systemic risk of financial institutions. In this talk we will discuss consistency and asymptotic normality of an estimator of MES on assuming that $(X, Y)$ follows a Conditional Extreme Value (CEV) model. The theoretical findings are supported by simulation studies. Our procedure is applied to some financial data. This is a joint work with Kevin Tong (Bank of Montreal).

Thursday, October 12, 2017 — 4:00 PM EDT

**Optimal Insurance: Belief Heterogeneity, Ambiguity, and Arrow's Theorem**

In Arrow's classical problem of demand for insurance indemnity schedules, it is well-known that the optimal insurance indemnification for an insurance buyer (the insured) is a straight deductible contract, when the insurer is a risk-neutral Expected Utility (EU) maximizer and when the insured is a risk-averse EU maximizer. In Arrow's framework, however, the two parties share the same probabilistic beliefs about the realizations of the underlying insurable loss, and neither party experiences ambiguity (Knightian uncertainty) about the distribution of this random loss. In this talk, I will discuss extensions of Arrow's classical result to situations of belief heterogeneity and ambiguity.

Thursday, October 5, 2017 — 4:00 PM EDT

**Statistically and Numerically Efficient Independence Test**

We study how to generate a statistical inference procedure that is both computational efficient and having theoretical guarantee on its statistical performance. Test of independence plays a fundamental role in many statistical techniques. Among the nonparametric approaches, the distance-based methods (such as the distance correlation based hypotheses testing for independence) have numerous advantages, comparing with many other alternatives. A known limitation of the distance-based method is that its computational complexity can be high. In general, when the sample size is n, the order of computational complexity of a distance-based method, which typically requires computing of all pairwise distances, can be O(n^2). Recent advances have discovered that in the univariate cases, a fast method with O(n log n) computational complexity and O(n) memory requirement exists. In this talk, I introduce a test of independence method based on random projection and distance correlation, which achieves nearly the same power as the state-of-the-art distance-based approach, works in the multivariate cases, and enjoys the O(n K log n) computational complexity and O(max{n,K}) memory requirement, where K is the number of random projections. Note that saving is achieved when K < n/ log n. We name our method a Randomly Projected Distance Covariance (RPDC). The statistical theoretical analysis takes advantage of some techniques on random projection, which are rooted in contemporary machine learning. Numerical experiments demonstrate the efficiency of the proposed method, in relative to several competitors.