Concentration of Maxima: Fundamental Limits of Exact Support Recovery in High Dimensions
We study the estimation of the support (set of non-zero components) of a sparse high-dimensional signal observed with additive and dependent noise. With the usual parameterization of the size of the support set and the signal magnitude, we characterize a phase-transition phenomenon akin to the Ingster’s signal detection boundary. We show that when the signal is above the so-called strong classification boundary, thresholding estimators achieve asymptotically perfect support recovery. This is so under arbitrary error dependence assumptions, provided that the marginal error distribution has rapidly varying tails. Conversely, under mild dependence conditions on the noise, we show that no thresholding estimators can achieve perfect support recovery if the signal is below the boundary. For log-concave error densities, the thresholding estimators are shown to be optimal and hence the strong classification boundary is universal, in this setting.
The proofs exploit a concentration of maxima phenomenon, known as relative stability. We obtain a complete characterization of the relative stability phenomenon for dependent Gaussian noise via Slepian, Sudakov-Fernique bounds and some Ramsey theory.
Analysis of Prescription Drug Utilization with Beta Regression Models
The healthcare sector in the U.S. is complex and is also a large sector that generates about 20% of the country's gross domestic product. Healthcare analytics has been used by researchers and practitioners to better understand the industry. In this paper, we examine and demonstrate the use of Beta regression models to study the utilization of brand name drugs in the U.S. in order to understand variability of brand name drug utilization across different areas. The models are fitted to public datasets obtained from the Medicare & Medicaid Services and the Internal Revenue Service. Integrated Nested Laplace Approximation (INLA) is used to perform the inference. The numerical results show that Beta regression models are able to fit the brand name drug claim rates well and including spatial dependence improves the performance of the Beta regression models.
Limit theorems for topological invariants of the dynamic multi-parameter simplicial complex
Topological Data Analysis (TDA) is a growing research area that broadly refers to the analysis of high-dimensional datasets, the main goal of which is to extract robust topological information from datasets. Among many fields in TDA, this talk deals with the problems related to the time evolution of topological structure. More specifically, we shall consider the multi-parameter simplicial complex model as a higher-dimensional generalization of the Erdos-Renyi graph. Topological study of existing random simplicial complexes is non-trivial and has led to several seminal works. However, the applicability of such studies is limited since the randomness there is usually governed by a single parameter. With this in mind, we focus here on the topology of the recently proposed multi-parameter random simplicial complex and, more importantly, of its dynamic analogue that we introduce here. In this dynamic setup, the temporal evolution of simplices is determined by stationary and possibly non-Markovian processes with a renewal structure. The dynamic versions of the clique complex and the Linial-Meshulum complex are special cases of our setup. Our key result concerns the regime where face-counts of a particular dimension dominate. We show that the Betti numbers (i.e., basic quantifiers of topological complexity) corresponding to this dimension, and the Euler characteristic satisfy functional strong law of large numbers and functional central limit theorems. Surprisingly, in the latter result, the limiting Gaussian process depends only upon the dynamics in the smallest non-trivial dimension. This is joint work with Gennady Samorodnitsky (Cornell) and Gugan Thoppe (Duke).
Regression trees are flexible non-parametric models that are well suited to many modern statistical learning problems. Many such tree models have been proposed, from the simple single-tree model (e.g. Classification and Regression Trees — CART) to more complex tree ensembles (e.g. Random Forests). Their nonparametric formulation allows one to model datasets exhibiting complex non-linear relationships between predictors and the response. A recent innovation in the statistical literature is the development of a Bayesian analogue to these classical regression tree models.
Two-sample test on funscalar data with application to hemodialysis monitoring by Raman spectroscopy
To achieve in-session monitoring of hemodialysis through Raman spectroscopy, it is necessary to compare data consist of Raman spectra and intensity values for specific biomarkers (e.g., urea) contained in waste dialysate used in hemodialysis treatement. This calls for the development of a two-sample test procedure for funscalar data, data that are a mix of functional and scalar variables. Despite a rich literature on univariate functional data testing procedures and a few publications on multivariate functional data testing procedures, there is no such a testing procedure for funscalar data. In this work we propose the first testing procedure for funscalar data, generalizing the functional data approach in Horvath et al (2013). The test statistic is based on the L_2 distance between the two mean funscalar objects. Its asymptotic null distribution and asymptotic power are studied. We then demonstrate its performance through extensive simulationis and its usefulness is through data collected in our hemodialysis monitoring experiments.
The Efficiency of Voluntary Risk Classification in Insurance Markets
It has been established that categorical discrimination based on observable characteristics such as gender, age, or ethnicity enhances efficiency. We consider a different form of risk classification when there exists a costless yet imperfectly informative test of risk type, with the test outcome unknown to the agents ex-ante. We show that a voluntary risk classification in which agents are given the option to take the test always increases efficiency compared with no risk classification. Moreover, voluntary risk classification also Pareto dominates a regime of compulsory risk classification in which all agents are required to take the test.
Robust and Efficient Estimation under Nonignorable Missing Response
We consider the estimation problem in a regression setting where the outcome variable is subject to nonignorable missingness and identiability is ensured by the shadow variable approach. We propose a versatile estimation procedure where modeling of missingness mechanism is completely bypassed. We show that our estimator is easy to implement and we derive the asymptotic theory of the proposed estimator. We also investigate some alternative estimators under different scenarios. Comprehensive simulation studies are conducted to demonstrate the nite sample performance of the method. We apply the estimator to a children's mental health study to illustrate its usefulness.