Monday, December 12, 2022 10:00 am
-
10:00 am
EST (GMT -05:00)
MC 5417 and Zoom (please email amgrad@uwaterloo.ca for the meeting link)
Candidate
Yanming Kang | Applied Mathematics, University of Waterloo
Title
Multi-level Transformer
Abstract
Transformer based models have shown strong performance on natural language tasks. However, the quadratic complexity of the self-attention operation limits the maximum input length can be handled. Our proposed model reduces the computational complexity to $O(n\log{n})$ by grouping tokens according to their distance to the target, and summarizing them using strided convolution. In this presentation I will review prior work focusing on the efficiency of Transformers, describe our method, and introduce some preliminary results.