PhD Comprehensive Exam | Yanming Kang, Multi-level Transformer

MC 5417 and Zoom (please email amgrad@uwaterloo.ca for the meeting link)

Candidate

Yanming Kang | Applied Mathematics, University of Waterloo

Title

Multi-level Transformer

Abstract

Transformer based models have shown strong performance on natural language tasks. However, the quadratic complexity of the self-attention operation limits the maximum input length can be handled. Our proposed model reduces the computational complexity to $O(n\log{n})$ by grouping tokens according to their distance to the target, and summarizing them using strided convolution. In this presentation I will review prior work focusing on the efficiency of Transformers, describe our method, and introduce some preliminary results.