Title: Generalization bounds for rational self-supervised learning algorithms
|Zoom:||Please email Emma Watson|
The generalization gap of a learning algorithm is the expected difference between its performance on the training data and its performance on fresh unseen test samples.Modern deep learning algorithms typically have large generalization gaps, as they use more parameters than the size of their training set. Moreover the best known rigorous bounds on their generalization gap are often vacuous.
In this talk we will see a new upper bound on the generalization gap of classifiers that are obtained by first using self-supervision to learn a complex representation of the (label free) training data, and then fitting a simple (e.g., linear) classifier to the labels. Such classifiers have become increasingly popular in recent years, as they offer several practical advantages. We show that (under the assumptions described below) the generalization gap of such classifiers tends to zero as long as the complexity of the simple classifier is asymptotically smaller than the number of training samples. We stress that our bound is independent of the complexity of the representation that can use an arbitrarily large number of parameters.
Our bound holds under the assumption that the learning algorithm satisfies certain noise-robustness (adding small amount of label noise causes small degradation in performance) and rationality (getting the wrong label is not better than getting no label at all) conditions that widely (and sometimes provably) hold across many standard architectures.
We complement this result with an empirical study, demonstrating that our bound is non-vacuous for many popular representation-learning based classifiers on CIFAR-10 and ImageNet, including SimCLR, AMDIM and BigBiGAN.
The talk will not assume any specific background in machine learning, and should be accessible to a general mathematical audience. Joint work with Yamini Bansal and Gal Kaplun.