Friday, February 7, 2020 — 10:00 AM EST

The Extended Reproducibility Phenotype - Re-framing and Generalizing Computational Reproducibility

Computational reproducibility has become a crucial part of how data analytic results are understood and assessed both in and outside of academia. Less work, however, has explored whether these strict computational reproducibility criteria are necessary or sufficient to actually meet our needs as consumers of analysis results. I will show that in principle they are neither. I will present two inter-related veins of work. First, I will provide a  conceptual reframing of the concepts of strict reproducibility, and the actions analysts take to ensure it, in terms of our ability to actually trust the results and the claims about the underlying data-generating systems they embody. Second, I will present a generalized conception of reproducibily by introducing the concepts of Currency, Comparability and Completeness and their oft-overlooked importance to assessing data analysis results.

Thursday, February 6, 2020 — 10:00 AM EST

Censoring Unbiased Regression Trees and Ensembles

Tree-based methods are useful tools to identify risk groups and conduct prediction by employing recursive partitioning to separate subjects into different risk groups. We propose a novel paradigm of building regression trees for censored data in survival analysis. We prudently construct the censored-data loss function through an extension of the theory of censoring unbiased transformations. With the construction, we can conveniently implement the proposed regression trees algorithm using existing software for the Classification and Regression Trees algorithm (e.g., rpart package in R) and extend it for ensemble learning. Simulations and real data examples demonstrate that our methods either improve upon or remain competitive with existing tree-based algorithms for censored data.

Wednesday, February 5, 2020 — 10:00 AM EST

Detecting the Signal Among Noise and Contamination in High Dimensions

Improvements in biomedical technology and a surge in other data-driven sciences lead to the collection of increasingly large amounts of data. In this affluence of data, contamination is ubiquitous but often neglected, creating substantial risk of spurious scientific discoveries. Especially in applications with high-dimensional data, for instance proteomic biomarker discovery, the impact of contamination on methods for variable selection and estimation can be profound yet difficult to diagnose.

In this talk I present a method for variable selection and estimation in high-dimensional linear regression models, leveraging the elastic-net penalty for complex data structures. The method is capable of harnessing the collected information even in the presence of arbitrary contamination in the response and the predictors. I showcase the method’s theoretical and practical advantages, specifically in applications with heavy-tailed errors and limited control over the data. I outline efficient algorithms to tackle computational challenges posed by inherently non-convex objective functions of robust estimators and practical strategies for hyper-parameter selection, ensuring scalability of the method and applicability to a wide range of problems.

Tuesday, February 4, 2020 — 10:00 AM EST

Bayesian Utility-Based Toxicity Probability Interval Design for Dose Finding in Phase I/II Trials

Molecularly targeted agents and immunotherapy have revolutionized modern cancer treatment. Unlike chemotherapy, the maximum tolerated dose of the targeted therapies may not pose significant clinical benefit over the lower doses. By simultaneously considering both binary toxicity and efficacy endpoints, phase I/II trials can identify a better dose for subsequent phase II trials than traditional phase I trials in terms of efficacy-toxicity tradeoff.  Existing phase I/II dose-finding methods are model-based or need to pre-specify many design parameters, which makes them difficult to implement in practice. To strengthen and simplify the current practice of phase I/II trials, we propose a utility-based toxicity probability interval (uTPI) design for finding the optimal biological dose (OBD) where binary toxicity and efficacy endpoints are observed. The uTPI design is model-assisted in nature, simply modeling the utility outcomes observed at the current dose level based on a quasibinomial likelihood. Toxicity probability intervals are used to screen out overly toxic dose levels, and then the dose escalation/de-escalation decisions are made adaptively by comparing the posterior utility distributions of the adjacent levels of the current dose. The uTPI design is flexible in accommodating various utility functions while only needs minimum design parameters. A prominent feature of the uTPI design is that it has a simple decision structure such that a concise dose-assignment decision table can be calculated before the start of trial and be used throughout the trial, which greatly simplifies practical implementation of the design. Extensive simulation studies demonstrate that the proposed uTPI design yields desirable as well as robust performance under various scenarios. This talk is based on the joint work with Ruitao Lin and Ying Yuan at MD Anderson Cancer Center.

  1. 2024 (1)
    1. August (1)
  2. 2022 (68)
    1. November (3)
    2. October (4)
    3. September (4)
    4. July (3)
    5. June (3)
    6. May (5)
    7. April (9)
    8. March (12)
    9. February (7)
    10. January (19)
  3. 2021 (89)
    1. December (12)
    2. November (12)
    3. October (8)
    4. September (5)
    5. July (4)
    6. June (3)
    7. May (6)
    8. April (8)
    9. March (13)
    10. February (7)
    11. January (12)
  4. 2020 (71)
    1. December (2)
    2. November (13)
    3. October (16)
    4. September (7)
    5. August (5)
    6. July (3)
    7. June (2)
    8. May (1)
    9. March (4)
    10. February (4)
    11. January (14)
  5. 2019 (65)
  6. 2018 (44)
  7. 2017 (55)
  8. 2016 (44)
  9. 2015 (37)
  10. 2014 (42)
  11. 2013 (45)
  12. 2012 (44)