Please note: This PhD seminar will be given online.
Yuqing Xie, PhD candidate
David R. Cheriton School of Computer Science
Supervisors: Professors Ming Li, Jimmy Lin
I would like to share the work I did during the internship with AWS AI about Backward-Compatibility NLP models. Behavior of deep neural networks can be inconsistent between different versions. Regressions during model update are a common cause of concern that often over-weigh the benefits in accuracy or efficiency gain.
In this talk I will focus on quantifying, reducing and analyzing regression errors in the NLP model updates. Using negative flip rate as the regression measure, we show that regression has a prevalent presence across tasks in the GLUE benchmark. We formulate the regression-free model updates into a constrained optimization problem, and further reduce it into a relaxed form which can be approximately optimized through knowledge distillation training method. We empirically analyze how model ensemble reduces regression. Finally, we conduct CheckList behavioral testing to understand the distribution of regressions across linguistic phenomena, and the efficacy of ensemble and distillation methods.