Events

Filter by:

Limit to events where the title matches:
Limit to events where the first date of the event:
Date range
Limit to events where the first date of the event:
Limit to events where the type is one or more of:
Limit to events tagged with one or more of:
Limit to events where the audience is one or more of:
Wednesday, January 29, 2020 10:00 am - 10:00 am EST (GMT -05:00)

Department seminar by Aya Mitani, Harvard T.H. Chan School of Public Health

Marginal analysis of multiple outcomes with informative cluster size


Periodontal disease is a serious infection of the gums and the bones surrounding the teeth. In Veterans Affairs Dental Longitudinal Study (VADLS), the relationships between periodontal disease and other health and socioeconomic conditions are of interest. To determine whether or not a patient has periodontal disease, multiple clinical measurements (clinical attachment loss, alveolar bone loss, tooth mobility) are taken at the tooth-level. However, a universal definition for periodontal disease does not exist and researchers often create a composite outcome from these measurements or analyze each outcome separately. Moreover, patients have varying number of teeth, with those that are more prone to the disease having fewer teeth compared to those with good oral health. Such dependence between the outcome of interest and cluster size (number of teeth) is called informative cluster size, and results obtained from fitting conventional marginal models can be biased. In this talk, I will introduce a novel method to jointly analyze multiple correlated outcomes for clustered data with informative cluster size using the class of generalized estimating equations (GEE) with cluster-specific weights. Using the data from VADLS, I will compare the results obtained from the proposed multivariate outcome cluster-weighted GEE to those from the conventional unweighted GEE. Finally, I will discuss a few other research settings where data may exhibit informative cluster size.

Thursday, January 30, 2020 10:00 am - 10:00 am EST (GMT -05:00)

Department seminar by Hyukjun (Jay) Gweon, Western University

Batch-mode active learning for regression and its application to the valuation of large variable annuity portfolios

Supervised learning algorithms require a sufficient amount of labeled data to construct an accurate predictive model. In practice, collecting labeled data may be extremely time-consuming while unlabeled data can be easily accessed. In a situation where labeled data are insufficient for a prediction model to perform well and the budget for an additional data collection is limited, it is important to effectively select objects to be labeled based on whether they contribute to a great improvement in the model's performance. In this talk, I will focus on the idea of active learning that aims to train an accurate prediction model with minimum labeling cost. In particular, I will present batch-mode active learning for regression problems. Based on random forest, I will propose two effective random sampling algorithms that consider the prediction ambiguities and diversities of unlabeled objects as measures of their informativeness. Empirical results on an insurance data set demonstrate the effectiveness of the proposed approaches in valuing large variable annuity portfolios (which is a practical problem in the actuarial field). Additionally, comparisons with the existing framework that relies on a sequential combination of unsupervised and supervised learning algorithms are also investigated.

Tuesday, February 4, 2020 10:00 am - 10:00 am EST (GMT -05:00)

Department seminar by Haolun Shi, Simon Fraser University

Bayesian Utility-Based Toxicity Probability Interval Design for Dose Finding in Phase I/II Trials


Molecularly targeted agents and immunotherapy have revolutionized modern cancer treatment. Unlike chemotherapy, the maximum tolerated dose of the targeted therapies may not pose significant clinical benefit over the lower doses. By simultaneously considering both binary toxicity and efficacy endpoints, phase I/II trials can identify a better dose for subsequent phase II trials than traditional phase I trials in terms of efficacy-toxicity tradeoff.  Existing phase I/II dose-finding methods are model-based or need to pre-specify many design parameters, which makes them difficult to implement in practice. To strengthen and simplify the current practice of phase I/II trials, we propose a utility-based toxicity probability interval (uTPI) design for finding the optimal biological dose (OBD) where binary toxicity and efficacy endpoints are observed. The uTPI design is model-assisted in nature, simply modeling the utility outcomes observed at the current dose level based on a quasibinomial likelihood. Toxicity probability intervals are used to screen out overly toxic dose levels, and then the dose escalation/de-escalation decisions are made adaptively by comparing the posterior utility distributions of the adjacent levels of the current dose. The uTPI design is flexible in accommodating various utility functions while only needs minimum design parameters. A prominent feature of the uTPI design is that it has a simple decision structure such that a concise dose-assignment decision table can be calculated before the start of trial and be used throughout the trial, which greatly simplifies practical implementation of the design. Extensive simulation studies demonstrate that the proposed uTPI design yields desirable as well as robust performance under various scenarios. This talk is based on the joint work with Ruitao Lin and Ying Yuan at MD Anderson Cancer Center.
 

Wednesday, February 5, 2020 10:00 am - 10:00 am EST (GMT -05:00)

Department seminar by David Kepplinger, University of British Columbia

Detecting the Signal Among Noise and Contamination in High Dimensions

Improvements in biomedical technology and a surge in other data-driven sciences lead to the collection of increasingly large amounts of data. In this affluence of data, contamination is ubiquitous but often neglected, creating substantial risk of spurious scientific discoveries. Especially in applications with high-dimensional data, for instance proteomic biomarker discovery, the impact of contamination on methods for variable selection and estimation can be profound yet difficult to diagnose.

In this talk I present a method for variable selection and estimation in high-dimensional linear regression models, leveraging the elastic-net penalty for complex data structures. The method is capable of harnessing the collected information even in the presence of arbitrary contamination in the response and the predictors. I showcase the method’s theoretical and practical advantages, specifically in applications with heavy-tailed errors and limited control over the data. I outline efficient algorithms to tackle computational challenges posed by inherently non-convex objective functions of robust estimators and practical strategies for hyper-parameter selection, ensuring scalability of the method and applicability to a wide range of problems.

Thursday, February 6, 2020 10:00 am - 10:00 am EST (GMT -05:00)

Department seminar by Liqun Diao, University of Waterloo

Censoring Unbiased Regression Trees and Ensembles

Tree-based methods are useful tools to identify risk groups and conduct prediction by employing recursive partitioning to separate subjects into different risk groups. We propose a novel paradigm of building regression trees for censored data in survival analysis. We prudently construct the censored-data loss function through an extension of the theory of censoring unbiased transformations. With the construction, we can conveniently implement the proposed regression trees algorithm using existing software for the Classification and Regression Trees algorithm (e.g., rpart package in R) and extend it for ensemble learning. Simulations and real data examples demonstrate that our methods either improve upon or remain competitive with existing tree-based algorithms for censored data.

Friday, February 7, 2020 10:00 am - 10:00 am EST (GMT -05:00)

Department seminar by Gabriel Becker, University of California Davis

The Extended Reproducibility Phenotype - Re-framing and Generalizing Computational Reproducibility

Computational reproducibility has become a crucial part of how data analytic results are understood and assessed both in and outside of academia. Less work, however, has explored whether these strict computational reproducibility criteria are necessary or sufficient to actually meet our needs as consumers of analysis results. I will show that in principle they are neither. I will present two inter-related veins of work. First, I will provide a  conceptual reframing of the concepts of strict reproducibility, and the actions analysts take to ensure it, in terms of our ability to actually trust the results and the claims about the underlying data-generating systems they embody. Second, I will present a generalized conception of reproducibily by introducing the concepts of Currency, Comparability and Completeness and their oft-overlooked importance to assessing data analysis results.

Thursday, March 5, 2020 4:00 pm - 4:00 pm EST (GMT -05:00)

Department seminar by Stilian Stoev, University of Michigan

Concentration of Maxima: Fundamental Limits of Exact Support Recovery in High Dimensions

We study the estimation of the support (set of non-zero components) of a sparse high-dimensional signal observed with additive and dependent noise. With the usual parameterization of the size of the support set and the signal magnitude, we characterize a phase-transition phenomenon akin to the Ingster’s signal detection boundary.  We show that when the signal is above the so-called strong classification boundary, thresholding estimators achieve asymptotically perfect support recovery. This is so under arbitrary error dependence assumptions, provided that the marginal error distribution has rapidly varying tails.  Conversely, under mild dependence conditions on the noise, we show that no thresholding estimators can achieve perfect support recovery if the signal is below the boundary.  For log-concave error densities, the thresholding estimators are shown to be optimal and hence the strong classification boundary is universal, in this setting.

The proofs exploit a concentration of maxima phenomenon, known as relative stability. We obtain a complete characterization of the relative stability phenomenon for dependent Gaussian noise via Slepian, Sudakov-Fernique bounds and some Ramsey theory.

Thursday, March 12, 2020 4:00 pm - 4:00 pm EDT (GMT -04:00)

Department seminar by Matthew Pratola, Ohio State University 

 Bayesian Additive Regression Trees for Statistical Learning

Regression trees are flexible non-parametric models that are well suited to many modern statistical learning problems. Many such tree models have been proposed, from the simple single-tree model (e.g. Classification and Regression Trees — CART) to more complex tree ensembles (e.g. Random Forests). Their nonparametric formulation allows one to model datasets exhibiting complex non-linear relationships between predictors and the response.  A recent innovation in the statistical literature is the development of a Bayesian analogue to these classical regression tree models.  The benefit of the Bayesian approach is the ability to quantify uncertainties within a holistic Bayesian framework.  We introduce the most popular variant, the Bayesian Additive Regression Trees (BART) model, and describe recent innovations to this framework.  We conclude with some of the exciting research directions currently being explored.