Monday, December 16, 2019 — 10:00 AM EST

The Blessings of Multiple Causes


Causal inference from observational data is a vital problem, but it comes with strong assumptions. Most methods assume that we observe all confounders, variables that affect both the causal variables and the outcome variables. But whether we have observed all confounders is a famously untestable assumption. We describe the deconfounder, a way to do causal inference from observational data allowing for unobserved confounding.

How does the deconfounder work? The deconfounder is designed for problems of multiple causal inferences: scientific studies that involve many causes whose effects are simultaneously of interest. The deconfounder uses the correlation among causes as evidence for unobserved confounders, combining unsupervised machine learning and predictive model checking to perform causal inference. We study the theoretical requirements for the deconfounder to provide unbiased causal estimates, along with its limitations and tradeoffs. We demonstrate the deconfounder on real-world data and simulation studies.

Thursday, December 19, 2019 — 10:00 AM EST

Uncover Hidden Fine-Grained Scientific Information: Structured Latent Attribute Models


In modern psychological and biomedical research with diagnostic purposes, scientists often formulate the key task as inferring the fine-grained latent information under structural constraints. These structural constraints usually come from the domain experts’ prior knowledge or insight. The emerging family of Structured Latent Attribute Models (SLAMs) accommodate these modeling needs and have received substantial attention in psychology, education, and epidemiology.  SLAMs bring exciting opportunities and unique challenges. In particular, with high-dimensional discrete latent attributes and structural constraints encoded by a design matrix, one needs to balance the gain in the model’s explanatory power and interpretability, against the difficulty of understanding and handling the complex model structure.

In the first part of this talk, I present identifiability results that advance the theoretical knowledge of how the design matrix influences the estimability of SLAMs. The new identifiability conditions guide real-world practices of designing diagnostic tests and also lay the foundation for drawing valid statistical conclusions. In the second part, I introduce a statistically consistent penalized likelihood approach to selecting significant latent patterns in the population. I also propose a scalable computational method. These developments explore an exponentially large model space involving many discrete latent variables, and they address the estimation and computation challenges of high-dimensional SLAMs arising from large-scale scientific measurements. The application of the proposed methodology to the data from an international educational assessment reveals meaningful knowledge structure of the student population.

Monday, January 6, 2020 — 10:00 AM EST

To be announced.

Tuesday, January 7, 2020 — 10:00 PM EST

Renewable Estimation and Incremental Inference in Streaming Data Analysis


New data collection and storage technologies have given rise to a new field of streaming data analytics, including real-time statistical methodology for online data analyses. Streaming data refers to high-throughput recordings with large volumes of observations gathered sequentially and perpetually over time. Such type of data includes national disease registry, mobile health, and disease surveillance, among others. This talk primarily concerns the development of a fast real-time statistical estimation and inference method for regression analysis, with a particular objective of addressing challenges in streaming data storage and computational efficiency. Termed as renewable estimation, this method enjoys strong theoretical guarantees, including both asymptotic unbiasedness and estimation efficiency, and fast computational speed. The key technical novelty pertains to the fact that the proposed method uses current data and summary statistics of historical data. The proposed algorithm will be demonstrated in generalized linear models (GLM) for cross-sectional data. I will discuss both conceptual understanding and theoretical guarantees of the method and illustrate its performance via numerical examples. This is joint work with my supervisor Professor Peter Song.

Thursday, January 9, 2020 — 10:00 AM EST

To be announced.

Friday, January 10, 2020 — 10:00 AM EST

To be announced.

Monday, January 13, 2020 — 10:00 AM EST

To be announced.

Tuesday, January 14, 2020 — 10:00 AM EST

To be announced.

Thursday, January 16, 2020 — 10:00 AM EST

To be announced.

Monday, January 20, 2020 — 10:00 AM EST

To be announced.

S M T W T F S
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
1
2
3
4
  1. 2020 (18)
    1. February (4)
    2. January (14)
  2. 2019 (66)
    1. December (3)
    2. November (8)
    3. October (8)
    4. September (4)
    5. August (2)
    6. July (2)
    7. June (2)
    8. May (7)
    9. April (7)
    10. March (6)
    11. February (4)
    12. January (13)
  3. 2018 (44)
    1. November (6)
    2. October (6)
    3. September (4)
    4. August (3)
    5. July (2)
    6. June (1)
    7. May (4)
    8. April (2)
    9. March (4)
    10. February (2)
    11. January (10)
  4. 2017 (55)
  5. 2016 (44)
  6. 2015 (38)
  7. 2014 (44)
  8. 2013 (46)
  9. 2012 (44)