Friday, August 21, 2020 — 10:30 AM to 11:30 AM EDT
Optimal supermartingales for anytime-valid sequential testing
Statistical testing is`anytime-valid’ if the decision to stop or continue an experiment can depend on anything that has been observed so far, without compromising statistical error guarantees. For instance, suppose that a promising but inconclusive study receives funding to gather additional data. Then standard p-value analysis is invalidated, but anytime-valid testing is not. A recent approach to anytime-valid testing views a test statistic as a bet against the null hypothesis. These bets are constrained to be supermartingales - hence unprofitable - under the null, but designed to be profitable under the relevant alternative hypotheses. This perspective opens the door to using tools from financial mathematics. In this talk I will explain how notions such as supermartingale measures, fork-convexity, the optional decomposition theorem, and universal portfolios can be used to design optimal supermartingales for anytime-valid sequential testing. (This talk is based on ongoing work with Aaditya Ramdas (CMU) and Johannes Ruf (LSE).)
Please Note: This talk will be given through Webex. To join, please click here: Department seminar by Martin Larsson
Wednesday, August 19, 2020 — 4:00 PM to 5:00 PM EDT
Assessing the Impacts of Mutations to the Structure of COVID-19 Spike Protein via Sequential Monte Carlo
Proteins play a key role in facilitating the infectiousness of the 2019 novel coronavirus. A specific spike protein enables this virus to bind to human cells, and a thorough understanding of its 3-dimensional structure is therefore critical for developing effective therapeutic interventions. However, its structure may continue to evolve over time as a result of mutations. We take a data science perspective to study the potential structural impacts due to ongoing mutations in its amino acid sequence. To do so, we identify a key segment of the protein and apply a sequential Monte Carlo sampling method to detect possible changes to the space of low-energy conformations for different amino acid sequences. Such computational approaches can further our understanding of this important protein structure and complement laboratory efforts.
Please Note: This talk will be given through Microsoft Teams. To join please click here: Student Seminar by Samuel Wong
Thursday, August 13, 2020 — 4:00 PM EDT
Concentration inequalities for sampling without replacement, with applications to post-election audits
Many practical tasks involve sampling sequentially without replacement from a finite population in order to estimate some parameter, like a mean. We discuss how to derive powerful (new) concentration inequalities for this setting using martingale techniques, and apply it to auditing elections (see below).
This is joint work with my PhD student, Ian Waudby-Smith, who was an undergrad at UWaterloo. An early preprint is available here.
More details: When determining the outcome of an election, electronic voting machines are often employed for their tabulation speed and cost-effectiveness. Unlike paper ballots, these machines are vulnerable to software bugs and fraudulent tampering. Post-election audits provide assurance that announced electoral outcomes are consistent with paper ballots or voter-verifiable records. We propose an approach to election auditing based on confidence sequences (VACSINE)—these are visualizable sequences of confidence sets for the total number of votes cast for each candidate that adaptively shrink to zero width. These confidence sequences have uniform coverage from the beginning of an audit to the point of an exhaustive recount, but their main advantage is that their error guarantee is immune to continuous monitoring and early stopping, providing valid inference at any auditor-chosen, data-dependent stopping time. We develop VACSINEs for various types of elections including plurality, approval, ranked-choice, and score voting protocols.
Please Note: This talk will be given through Zoom. To join, please follow this link: Department Seminar by Aaditya Ramdas.
Wednesday, August 12, 2020 — 4:00 PM EDT
A statistician's introduction to proteomics
Proteomics is the large-scale study of proteins. It has important applications in drug discovery and antibody sequencing. In this talk, I would like to explain the basic concepts and data formats in proteomics. I will introduce the commonly used workflows to generate statistically analyzable data from the raw data stored on public repositories. And, I want to sQiaoare with you several important research topics in proteomics where I think statisticians could make a huge contribution.
Please Note: This talk will be given through Microsoft Teams. To join, please follow this link: Virtual Seminar by Rui Qiao.
Wednesday, August 5, 2020 — 4:00 PM EDT
Discrimination-aware decisions in finance and insurance
We discuss the implications of considering protected attributes when individuals are paired with measures of risk. Two examples are analyzed, a credit scoring example using simulated data is given from the perspective of the regulator and an insurance pricing scenario is analyzed in view of the underlying causal model.
Please Note: This talk will be given online through Microsoft Teams. To join, please follow this link: Virtual Seminar by Carlos Araiza Iturria.