PhD Defence • Machine Learning | Natural Language Processing • Less is More: Restricted Representations for Better Interpretability and Generalizability

Tuesday, July 18, 2023 3:00 pm - 6:00 pm EDT (GMT -04:00)

Please note: This PhD defence will take place online.

Zhiying (Gin) Jiang, PhD candidate
David R. Cheriton School of Computer Science

Supervisor: Professor Jimmy Lin

In this thesis, we aim at improving interpretability and generalizability through restricting representations. We choose to approach interpretability by focusing on attribution analysis to understand which features contribute to prediction on BERT, and to approach generalizability by focusing on effective methods in low-data regime.

We consider two strategies of restricting representations: (1) adding bottleneck, and (2) introducing compression. We first introduce how adding information bottleneck can help attribution analysis and apply it to investigate BERT’s behavior on text classification. We then extend this attribution method to analyze passage reranking, where we conduct a detailed analysis to understand cross-layer and cross-passage behavior. Adding information bottleneck can not only provide insight to understand deep neural networks but can also be used to increase generalizability.

We demonstrate the equivalence between adding information bottleneck and doing neural compression. We then leverage this finding with a framework called Non-Parametric learning by Compression with Latent Variables (NPC-LV), and show how optimizing neural compressors can be used in the non-parametric image classification with few labeled data. To further investigate how compression alone helps non-parametric learning without latent variables (NPC), we carry out experiments with a universal compressor gzip on text classification.

Finally, we elucidate methods of adopting the perspective of doing compression but without the actual process of compression. Using experimental results in passage reranking, we show that our method is highly effective in low-data regime.