Events - October 2019

Thursday, October 31, 2019 — 4:00 PM EDT

More information about this seminar will be added as soon as possible.

Friday, October 25, 2019 — 10:30 AM EDT

More information about this seminar will be added as soon as possible.

Friday, October 18, 2019 — 8:00 AM to Saturday, October 19, 2019 — 5:00 PM EDT
First student conference in Statistics, Actuarial Science, and Finance
Thursday, October 17, 2019 — 4:00 PM EDT

Building Deep Statistical Thinking for Data Science 2020: Privacy Protected Census, Gerrymandering, and Election


The year 2020 will be a busy one for statisticians and more generally data scientists.  The US Census Bureau has announced that the data from the 2020 Census will be released under differential privacy (DP) protection, which in layperson’s terms means adding some noises to the data.  While few would argue against protecting data privacy, many researchers, especially from the social sciences, are concerned whether the right trade-offs between data privacy and data utility are being made. The DP protection also has direct impact on redistricting, an issue that is already complicated enough with accurate counts, due to the need of guarding against excessive gerrymandering.  The central statistical problem there is a rather unique one:  how to determine whether a realization is an outlier with respect to a null distribution, when that null distribution itself cannot be fully determined?  The 2020 US election will be another highly watched event, with many groups already busy making predictions. Will the lessons from predicting the 2016 US election be learned, or the failure be repeated?  This talk invites the audience on a journey of deep statistical thinking prompted by these questions, regardless whether they have any interest in the US Census or politics.


Tuesday, October 15, 2019 — 4:00 PM EDT

Graphical Models and Structural Learning for Extremes


Conditional independence, graphical models and sparsity are key notions for parsimonious models in high dimensions and for learning structural relationships in the data. The theory of multivariate and spatial extremes describes the risk of rare events through asymptotically justified limit models such as max-stable and multivariate Pareto distributions. Statistical modeling in this field has been limited to moderate dimensions so far, owing to complicated likelihoods and a lack of understanding of the underlying probabilistic structures.

We introduce a general theory of conditional independence for multivariate Pareto distributions that allows to define graphical models and sparsity for extremes. New parametric models can be built in a modular way and statistical inference can be simplified to lower-dimensional margins. We define the extremal variogram, a new summary statistics that turns out to be a tree metric and therefore allows to efficiently learn an underlying tree structure through Prim's algorithm. For a popular parametric class of multivariate Pareto distributions we show that, similarly to the Gaussian case, the sparsity pattern of a general graphical model can be easily read of from suitable inverse covariance matrices. This enables the definition of an extremal graphical lasso that enforces sparsity in the dependence structure. We illustrate the results with an application to flood risk assessment on the Danube river.

This is joint work with Adrien Hitz. Preprint available on \texttt{https://arxiv.org/abs/1812.01734}.

Friday, October 11, 2019 — 10:30 AM EDT

Precision Factor Investing: Avoiding Factor Traps by Predicting Heterogeneous Effects of Firm Characteristics


We apply ideas from causal inference and machine learning to estimate the sensitivity of future stock returns to observable characteristics like size, value, and momentum. By analogy with the informal notion of a "value trap," we distinguish "characteristic traps" (stocks with weak sensitivity) from "characteristic responders" (those with strong sensitivity). We classify stocks by interpreting these distinctions as heterogeneous treatment effects (HTE), with characteristics interpreted as treatments and future returns interpreted as responses. The classification exploits a large set of stock features and recent work applying machine learning to HTE. Long-short strategies based on sorting stocks on characteristics perform significantly better when applied to characteristic responders than traps. A strategy based on the difference between these long-short returns profits from the predictability of HTE rather than from factors associated with the characteristics themselves. This is joint work with Pu He.

Thursday, October 10, 2019 — 4:00 PM EDT

Finding Common Modules in a Time-Varying Network


Finding functional modules in gene regulation networks is an important task in systems biology. Many methods have been proposed for finding communities in static networks; however, the application of such methods is limited due to the dynamic nature of gene regulation networks. We propose a statistical framework for detecting common modules in the Drosophila melanogaster time-varying gene regulation network. We then develop both a significance test and a robustness test for the identified modular structure. We apply an enrichment analysis to our community findings, which reveals interesting results. Moreover, we investigate the consistency property of our proposed method under a time-varying stochastic block model framework with a temporal correlation structure. Although we focus on gene regulation networks in our work, our method is general and can be applied to other time-varying networks.

Thursday, October 3, 2019 — 4:00 PM EDT

Real World EHR Big Data: Challenges and Opportunities


The real world EHR and health care Big Data may bring a revolutionary thinking on how to evaluate therapeutic treatments and clinical pathways in a real world setting. Big EHR data may also allow us to identify specific patient populations for a specific treatment so that the concept of personalized treatment can be implemented and deployed directly on the EHR system. However, it is quite challenging to use the real world data in treatment assessment and disease predictions due to various reasons. In this talk, I will share our experiences on EHR and health care Big Data research. First, I will discuss the basic infrastructure and multi-disciplinary team that is necessary in order to deal with the EHR data. Then I will use an example of  subarachnoid hemorrhage (SAH) study to demonstrate a procedure with eight steps that we have developed to use EHR data for research purpose. In particular, the EHR data extraction, cleaning, pre-processing and preparation are the major steps that require more novel statistical methods to deal with. Finally I will discuss the challenges and opportunities for statisticians to use EHR data for research.

S M T W T F S
29
30
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
1
2
  1. 2019 (59)
    1. November (4)
    2. October (8)
    3. September (4)
    4. August (2)
    5. July (2)
    6. June (2)
    7. May (7)
    8. April (7)
    9. March (6)
    10. February (4)
    11. January (13)
  2. 2018 (44)
    1. November (6)
    2. October (6)
    3. September (4)
    4. August (3)
    5. July (2)
    6. June (1)
    7. May (4)
    8. April (2)
    9. March (4)
    10. February (2)
    11. January (10)
  3. 2017 (55)
  4. 2016 (44)
  5. 2015 (38)
  6. 2014 (44)
  7. 2013 (46)
  8. 2012 (44)