Title: Soft Condorcet Optimization
Speaker: |
Kate Larson |
Affiliation: |
University of Waterloo |
Location: |
MC 5501 |
Abstract:
A common way to drive the progress of AI models and agents is to compare their performance on standardized benchmarks. This often involves aggregating individual performances across a potentially wide variety of tasks and benchmarks and many of the leaderboards that draw greatest attention are Elo-based.
In this paper, we describe a novel ranking scheme inspired by social choice frameworks, called Soft Condorcet Optimization (SCO), to compute the optimal ranking of agents: the one that makes the fewest mistakes in predicting the agent comparisons in the evaluation data. This optimal ranking is the maximum likelihood estimate when evaluation data (which we view as votes) are interpreted as noisy samples from a ground truth ranking, a solution to Condorcet's original voting system criteria and inherits desirable social-choice inspired properties since SCO ratings are maximal for Condorcet winners when they exist, which we show is not necessarily true for the classical rating system Elo.
We propose three optimization algorithms to compute SCO ratings and evaluate their empirical performance across a variety of synthetic and real-world datasets, to illustrate different properties.
With Marc Lanctot, Ian Gemp, Quentin Berthet, Yoram Bachrach, Manfred Diaz, Roberto-Rafael Maura-Rivero, Anna Koop, and Doina Precup