Events

Filter by:

Limit to events where the title matches:
Limit to events where the first date of the event:
Date range
Limit to events where the type is one or more of:
Limit to events tagged with one or more of:
Limit to events where the audience is one or more of:
Thursday, January 30, 2020 10:00 am - 10:00 am EST (GMT -05:00)

Department seminar by Hyukjun (Jay) Gweon, Western University

Batch-mode active learning for regression and its application to the valuation of large variable annuity portfolios

Supervised learning algorithms require a sufficient amount of labeled data to construct an accurate predictive model. In practice, collecting labeled data may be extremely time-consuming while unlabeled data can be easily accessed. In a situation where labeled data are insufficient for a prediction model to perform well and the budget for an additional data collection is limited, it is important to effectively select objects to be labeled based on whether they contribute to a great improvement in the model's performance. In this talk, I will focus on the idea of active learning that aims to train an accurate prediction model with minimum labeling cost. In particular, I will present batch-mode active learning for regression problems. Based on random forest, I will propose two effective random sampling algorithms that consider the prediction ambiguities and diversities of unlabeled objects as measures of their informativeness. Empirical results on an insurance data set demonstrate the effectiveness of the proposed approaches in valuing large variable annuity portfolios (which is a practical problem in the actuarial field). Additionally, comparisons with the existing framework that relies on a sequential combination of unsupervised and supervised learning algorithms are also investigated.

Thursday, June 25, 2020 4:00 pm - 4:00 pm EDT (GMT -04:00)

Department seminar by Guillaume Saint-Jacques, Linkedin

Fairness through Experimentation: Inequality in A/B testing as an approach to responsible design


As technology continues to advance, there is increasing concern about individuals being left behind. Many businesses are striving to adopt responsible design practices and avoid any unintended consequences of their products and services, ranging from privacy vulnerabilities to algorithmic bias. We propose a novel approach to fairness and inclusiveness based on experimentation. We use experimentation because we want to assess not only the intrinsic properties of products and algorithms but also their impact on people. We do this by introducing an inequality approach to A/B testing, leveraging the Atkinson index from the economics literature. We show how to perform causal inference over this inequality measure. We also introduce the concept of site-wide inequality impact, which captures the inclusiveness impact of targeting specific subpopulations for experiments, and show how to conduct statistical inference on this impact. We provide real examples from LinkedIn, as well as an open-source, highly scalable implementation of the computation of the Atkinson index and its variance in Spark/Scala. We also provide over a year's worth of learnings -- gathered by deploying our method at scale and analyzing thousands of experiments -- on which areas and which kinds of product innovations seem to inherently foster fairness through inclusiveness.

Please note: This seminar will be given online through Webex. To join, please follow this link: Virtual seminar by Guillaume Saint-Jacques.

Thursday, July 16, 2020 5:00 pm - 5:00 pm EDT (GMT -04:00)

Department seminar by Nan Zou, Macquarie University

Multivariate Extremes: Block-Maxima vs Peak-Over-Threshold” 


Extreme value theory is concerned with describing the tail behaviour of univariate and multivariate distributions. In the estimation of the dependence structure of the extremes of multiple time series, the block maxima method and the peaks-over-threshold method are frequently applied. In this talk, I will compare these methods and propose some new methodologies. This is joint work with A. Bücher and S. Volgushev.

Nan is a lecturer in the Department of Mathematics and Statistics at Macquarie University in Sydney, Australia.

Please note: This seminar will be delivered via Zoom. Please check back later for the link. 

*This seminar will start at 5:00 p.m.

Thursday, July 23, 2020 4:00 pm - 4:00 pm EDT (GMT -04:00)

Department Seminar by Kevin (Haosui) Duanmu, UC Berkeley

Applications of Nonstandard Analysis to Markov Processes


Nonstandard analysis, a powerful machinery derived from mathematical logic, has had many applications in probability theory as well as stochastic processes. Nonstandard analysis allows construction of a single object---a hyperfinite probability space---which satisfies all the first order logical properties of a finite probability space, but which can be simultaneously viewed as a measure-theoretical probability space via the Loeb construction. As a consequence, the hyperfinite/measure duality has proven to be particularly in porting discrete results into their continuous settings. 

In this talk, for every general-state-space discrete-time Markov process satisfying appropriate conditions, we construct a hyperfinite Markov process which has all the basic order logical properties of a finite Markov process to represent it.  We show that the mixing time and the hitting time agree with each other up to some multiplicative constants for discrete-time general-state-space reversible Markov processes satisfying certain condition. Finally, we show that our result is applicable to a large class of Gibbs samplers and Metropolis-Hasting algorithms.

Please note: This seminar will be delivered online through Webex. To join, please follow this link: Virtual seminar by Kevin (Haosui) Duanmu.

Wednesday, July 29, 2020 4:00 pm - 4:00 pm EDT (GMT -04:00)

Student Seminar by Chris Salahub, PhD in Statistics

A statistician's introduction to genomics


A classical model of genetic association is introduced alongside a short history of its development with a particular focus on mouse models. The inferential consequences of the widespread use of mouse models are discussed, and the modern application of this model is introduced as a problem of measuring pairwise associations in a large data set. A broad algebraic framework for this model and others like it is used to demonstrate several results and suggest future avenues of investigation.

Wednesday, August 5, 2020 4:00 pm - 4:00 pm EDT (GMT -04:00)

Student Seminar by Carlos Araiza Iturria, PhD in Actuarial Science

Discrimination-aware decisions in finance and insurance


We discuss the implications of considering protected attributes when individuals are paired with measures of risk. Two examples are analyzed, a credit scoring example using simulated data is given from the perspective of the regulator and an insurance pricing scenario is analyzed in view of the underlying causal model. 

Please Note: This talk will be given online through Microsoft Teams. To join, please follow this link: Virtual Seminar by Carlos Araiza Iturria.

Wednesday, August 12, 2020 4:00 pm - 4:00 pm EDT (GMT -04:00)

Student Seminar by Rui Qiao, PhD in Statistics

A statistician's introduction to proteomics


Proteomics is the large-scale study of proteins. It has important applications in drug discovery and antibody sequencing. In this talk, I would like to explain the basic concepts and data formats in proteomics. I will introduce the commonly used workflows to generate statistically analyzable data from the raw data stored on public repositories. And, I want to sQiaoare with you several important research topics in proteomics where I think statisticians could make a huge contribution.

Please Note: This talk will be given through Microsoft Teams. To join, please follow this link: Virtual Seminar by Rui Qiao.

Thursday, August 13, 2020 4:00 pm - 4:00 pm EDT (GMT -04:00)

Department Seminar by Aaditya Ramdas, Carnegie Mellon University

Concentration inequalities for sampling without replacement, with applications to post-election audits


Many practical tasks involve sampling sequentially without replacement from a finite population in order to estimate some parameter, like a mean. We discuss how to derive powerful (new) concentration inequalities for this setting using martingale techniques, and apply it to auditing elections (see below).

This is joint work with my PhD student, Ian Waudby-Smith, who was an undergrad at UWaterloo. An early preprint is available here.

More details: When determining the outcome of an election, electronic voting machines are often employed for their tabulation speed and cost-effectiveness. Unlike paper ballots, these machines are vulnerable to software bugs and fraudulent tampering. Post-election audits provide assurance that announced electoral outcomes are consistent with paper ballots or voter-verifiable records. We propose an approach to election auditing based on confidence sequences (VACSINE)—these are visualizable sequences of confidence sets for the total number of votes cast for each candidate that adaptively shrink to zero width. These confidence sequences have uniform coverage from the beginning of an audit to the point of an exhaustive recount, but their main advantage is that their error guarantee is immune to continuous monitoring and early stopping, providing valid inference at any auditor-chosen, data-dependent stopping time. We develop VACSINEs for various types of elections including plurality, approval, ranked-choice, and score voting protocols.

Please Note: This talk will be given through Zoom. To join, please follow this link: Department Seminar by Aaditya Ramdas.

Wednesday, August 19, 2020 4:00 pm - 5:00 pm EDT (GMT -04:00)

Student Seminar by Samuel Wong, Assistant Professor

Assessing the Impacts of Mutations to the Structure of COVID-19 Spike Protein via Sequential Monte Carlo


Proteins play a key role in facilitating the infectiousness of the 2019 novel coronavirus. A specific spike protein enables this virus to bind to human cells, and a thorough understanding of its 3-dimensional structure is therefore critical for developing effective therapeutic interventions. However, its structure may continue to evolve over time as a result of mutations. We take a data science perspective to study the potential structural impacts due to ongoing mutations in its amino acid sequence. To do so, we identify a key segment of the protein and apply a sequential Monte Carlo sampling method to detect possible changes to the space of low-energy conformations for different amino acid sequences. Such computational approaches can further our understanding of this important protein structure and complement laboratory efforts.

Please Note: This talk will be given through Microsoft Teams. To join please click here: Student Seminar by Samuel Wong

Thursday, September 10, 2020 4:00 pm - 4:00 pm EDT (GMT -04:00)

Department seminar by Emma Jingfei Zhang, Miami University

Network Response Regression for Modeling Population of Networks with Covariates


Multiple-network data are fast emerging in recent years, where a separate network over a common set of nodes is measured for each individual subject, along with rich subject covariates information. Existing network analysis methods have primarily focused on modeling a single network, and are not directly applicable to multiple networks with subject covariates.

In this talk, we present a new network response regression model, where the observed networks are treated as matrix-valued responses, and the individual covariates as predictors. The new model characterizes the population-level connectivity pattern through a low-rank intercept matrix, and the parsimonious effects of subject covariates on the network through a sparse slope tensor. We formulate the parameter estimation as a non-convex optimization problem, and develop an efficient alternating gradient descent algorithm. We establish the non-asymptotic error bound for the actual estimator from our optimization algorithm. Built upon this error bound, we derive the strong consistency for network community recovery, as well as the edge selection consistency. We demonstrate the efficacy of our method through intensive simulations and two brain connectivity studies.

Join Zoom Meeting

Meeting ID: 844 283 6948
Passcode: 318995