Department seminar by Lucy Gao, University of Washington

Tuesday, January 14, 2020 10:00 am - 10:00 am EST (GMT -05:00)

Statistical Inference for Multi-View Clustering

In the multi-view data setting, multiple data sets are collected on a single, common set of observations. For example, we might perform genomic and proteomic assays on a single set of tumour samples, or we might collect relationship data from two online social networks for a single set of users. It is tempting to cluster the observations using all of the data views, in order to fully exploit the available information. However, clustering the observations using all of the data views implicitly assumes that a single underlying clustering of the observations is shared across all data views. If this assumption does not hold, then clustering the observations using all data views may lead to spurious results. We seek to evaluate the assumption that there is some underlying relationship among the clusterings from the different data views, by asking the question: are the clusters within each data view dependent or independent? We develop new tests for answering this question based on multivariate and/or network data views, and apply them to multi-omics data from the Pioneer 100 Wellness Study (Price and others, 2017) and protein-protein interaction data from the HINT database (Das and Yu, 2012). We will also briefly discuss our current work on testing for no difference between the means of two estimated clusters in a single-view data set. This is joint work with Jacob Bien (University of Southern California) and Daniela Witten (University of Washington).