Citation:
Roy, S. , Maity, S. , Xue, S. , Yurochkin, M. , & Sun, Y. . (2025). How does overparametrization affect performance on minority groups?. Transactions on Machine Learning Research. arXiv. doi:10.48550/arXiv.2206.03515
Abstract:
The benefits of overparameterization for the overall performance of modern machine learning (ML) models are well known. However, the effect of overparameterization at a more granular level of data subgroups is less understood. Recent empirical studies demonstrate encouraging results: (i) when groups are not known, overparameterized models trained with empirical risk minimization (ERM) perform better on minority groups; (ii) when groups are known, ERM on data subsampled to equalize group sizes yields state-of-the-art worst-group accuracy in the overparameterized regime. In this paper, we complement these empirical studies with a theoretical investigation of the risk of overparameterized random feature regression models on minority groups with identical feature distribution as the majority group. In a setting in which the regression functions for the majority and minority groups are different, we show that overparameterization either improves or does not harm the asymptotic minority group performance under the ERM setting when the features are distributed uniformly over the sphere.