Events

Batch-mode active learning for regression and its application to the valuation of large variable annuity portfolios

Supervised learning algorithms require a sufficient amount of labeled data to construct an accurate predictive model. In practice, collecting labeled data may be extremely time-consuming while unlabeled data can be easily accessed. In a situation where labeled data are insufficient for a prediction model to perform well and the budget for an additional data collection is limited, it is important to effectively select objects to be labeled based on whether they contribute to a great improvement in the model's performance. In this talk, I will focus on the idea of active learning that aims to train an accurate prediction model with minimum labeling cost. In particular, I will present batch-mode active learning for regression problems. Based on random forest, I will propose two effective random sampling algorithms that consider the prediction ambiguities and diversities of unlabeled objects as measures of their informativeness. Empirical results on an insurance data set demonstrate the effectiveness of the proposed approaches in valuing large variable annuity portfolios (which is a practical problem in the actuarial field). Additionally, comparisons with the existing framework that relies on a sequential combination of unsupervised and supervised learning algorithms are also investigated.

Fairness through Experimentation: Inequality in A/B testing as an approach to responsible design

As technology continues to advance, there is increasing concern about individuals being left behind. Many businesses are striving to adopt responsible design practices and avoid any unintended consequences of their products and services, ranging from privacy vulnerabilities to algorithmic bias. We propose a novel approach to fairness and inclusiveness based on experimentation. We use experimentation because we want to assess not only the intrinsic properties of products and algorithms but also their impact on people. We do this by introducing an inequality approach to A/B testing, leveraging the Atkinson index from the economics literature. We show how to perform causal inference over this inequality measure. We also introduce the concept of site-wide inequality impact, which captures the inclusiveness impact of targeting specific subpopulations for experiments, and show how to conduct statistical inference on this impact. We provide real examples from LinkedIn, as well as an open-source, highly scalable implementation of the computation of the Atkinson index and its variance in Spark/Scala. We also provide over a year's worth of learnings -- gathered by deploying our method at scale and analyzing thousands of experiments -- on which areas and which kinds of product innovations seem to inherently foster fairness through inclusiveness.

Please note: This seminar will be given online through Webex. To join, please follow this link: Virtual seminar by Guillaume Saint-Jacques.

Multivariate Extremes: Block-Maxima vs Peak-Over-Threshold”

Extreme value theory is concerned with describing the tail behaviour of univariate and multivariate distributions. In the estimation of the dependence structure of the extremes of multiple time series, the block maxima method and the peaks-over-threshold method are frequently applied. In this talk, I will compare these methods and propose some new methodologies. This is joint work with A. Bücher and S. Volgushev.

Nan is a lecturer in the Department of Mathematics and Statistics at Macquarie University in Sydney, Australia.

Please note: This seminar will be delivered via Zoom. Please check back later for the link.

*This seminar will start at 5:00 p.m.

Applications of Nonstandard Analysis to Markov Processes

Nonstandard analysis, a powerful machinery derived from mathematical logic, has had many applications in probability theory as well as stochastic processes. Nonstandard analysis allows construction of a single object---a hyperfinite probability space---which satisfies all the first order logical properties of a finite probability space, but which can be simultaneously viewed as a measure-theoretical probability space via the Loeb construction. As a consequence, the hyperfinite/measure duality has proven to be particularly in porting discrete results into their continuous settings.

In this talk, for every general-state-space discrete-time Markov process satisfying appropriate conditions, we construct a hyperfinite Markov process which has all the basic order logical properties of a finite Markov process to represent it. We show that the mixing time and the hitting time agree with each other up to some multiplicative constants for discrete-time general-state-space reversible Markov processes satisfying certain condition. Finally, we show that our result is applicable to a large class of Gibbs samplers and Metropolis-Hasting algorithms.

Please note: This seminar will be delivered online through Webex. To join, please follow this link: Virtual seminar by Kevin (Haosui) Duanmu.

A statistician's introduction to genomics

A classical model of genetic association is introduced alongside a short history of its development with a particular focus on mouse models. The inferential consequences of the widespread use of mouse models are discussed, and the modern application of this model is introduced as a problem of measuring pairwise associations in a large data set. A broad algebraic framework for this model and others like it is used to demonstrate several results and suggest future avenues of investigation.

Discrimination-aware decisions in finance and insurance

We discuss the implications of considering protected attributes when individuals are paired with measures of risk. Two examples are analyzed, a credit scoring example using simulated data is given from the perspective of the regulator and an insurance pricing scenario is analyzed in view of the underlying causal model.

Please Note: This talk will be given online through Microsoft Teams. To join, please follow this link: Virtual Seminar by Carlos Araiza Iturria.

A statistician's introduction to proteomics

Proteomics is the large-scale study of proteins. It has important applications in drug discovery and antibody sequencing. In this talk, I would like to explain the basic concepts and data formats in proteomics. I will introduce the commonly used workflows to generate statistically analyzable data from the raw data stored on public repositories. And, I want to sQiaoare with you several important research topics in proteomics where I think statisticians could make a huge contribution.

Please Note: This talk will be given through Microsoft Teams. To join, please follow this link: Virtual Seminar by Rui Qiao.

Concentration inequalities for sampling without replacement, with applications to post-election audits

Many practical tasks involve sampling sequentially without replacement from a finite population in order to estimate some parameter, like a mean. We discuss how to derive powerful (new) concentration inequalities for this setting using martingale techniques, and apply it to auditing elections (see below).

This is joint work with my PhD student, Ian Waudby-Smith, who was an undergrad at UWaterloo. An early preprint is available here.

More details: When determining the outcome of an election, electronic voting machines are often employed for their tabulation speed and cost-effectiveness. Unlike paper ballots, these machines are vulnerable to software bugs and fraudulent tampering. Post-election audits provide assurance that announced electoral outcomes are consistent with paper ballots or voter-verifiable records. We propose an approach to election auditing based on confidence sequences (VACSINE)—these are visualizable sequences of confidence sets for the total number of votes cast for each candidate that adaptively shrink to zero width. These confidence sequences have uniform coverage from the beginning of an audit to the point of an exhaustive recount, but their main advantage is that their error guarantee is immune to continuous monitoring and early stopping, providing valid inference at any auditor-chosen, data-dependent stopping time. We develop VACSINEs for various types of elections including plurality, approval, ranked-choice, and score voting protocols.

Please Note: This talk will be given through Zoom. To join, please follow this link: Department Seminar by Aaditya Ramdas.

Assessing the Impacts of Mutations to the Structure of COVID-19 Spike Protein via Sequential Monte Carlo

Proteins play a key role in facilitating the infectiousness of the 2019 novel coronavirus. A specific spike protein enables this virus to bind to human cells, and a thorough understanding of its 3-dimensional structure is therefore critical for developing effective therapeutic interventions. However, its structure may continue to evolve over time as a result of mutations. We take a data science perspective to study the potential structural impacts due to ongoing mutations in its amino acid sequence. To do so, we identify a key segment of the protein and apply a sequential Monte Carlo sampling method to detect possible changes to the space of low-energy conformations for different amino acid sequences. Such computational approaches can further our understanding of this important protein structure and complement laboratory efforts.

Please Note: This talk will be given through Microsoft Teams. To join please click here: Student Seminar by Samuel Wong

Optimal supermartingales for anytime-valid sequential testing

Statistical testing is`anytime-valid’ if the decision to stop or continue an experiment can depend on anything that has been observed so far, without compromising statistical error guarantees. For instance, suppose that a promising but inconclusive study receives funding to gather additional data. Then standard p-value analysis is invalidated, but anytime-valid testing is not. A recent approach to anytime-valid testing views a test statistic as a bet against the null hypothesis. These bets are constrained to be supermartingales - hence unprofitable - under the null, but designed to be profitable under the relevant alternative hypotheses. This perspective opens the door to using tools from financial mathematics. In this talk I will explain how notions such as supermartingale measures, fork-convexity, the optional decomposition theorem, and universal portfolios can be used to design optimal supermartingales for anytime-valid sequential testing. (This talk is based on ongoing work with Aaditya Ramdas (CMU) and Johannes Ruf (LSE).)

Please Note: This talk will be given through Webex. To join, please click here: Department seminar by Martin Larsson

Events

Filter by:

Department seminar by Hyukjun (Jay) Gweon, Western University

Department seminar by Guillaume Saint-Jacques, Linkedin

Department seminar by Nan Zou, Macquarie University

Department Seminar by Kevin (Haosui) Duanmu, UC Berkeley

Student Seminar by Chris Salahub, PhD in Statistics

Student Seminar by Carlos Araiza Iturria, PhD in Actuarial Science

Student Seminar by Rui Qiao, PhD in Statistics

Department Seminar by Aaditya Ramdas, Carnegie Mellon University

Student Seminar by Samuel Wong, Assistant Professor

Department Seminar by Martin Larsson, Carnegie Mellon University